INTRODUCTION
This presentation [1] is designed to stimulate discussion of new computer-based
technology which is now available and which will become available to chemists, applied to
chemistry, and most importantly, used by chemists in their everyday activities from today
to well into the beginning of the next century. Starting at a presentation at a EUSIDIC
meeting in Heidelberg in 1988, this paper has evolved over the past five years, and no
doubt will continue to evolve as new phenomena stimulate changes in the habits and
activities of chemists.
As computer technology has developed, the use of computers in chemistry has expanded from simple arithmetic calculations to very broad areas of chemistry. This paper delves into some of these areas and tries to summarize the current state of the use of computers in chemistry and what the author believes the use of computers in the field of chemistry will be a little over decade from now, which is roughly, from the years 2005 - 2010.
BACKGROUND
Computers, like any other technological tool, have become integrated gradually
into the daily routine of chemists. The widespread use of computers in chemistry has
clearly been handicapped by a number of factors, a major one being the lack of
familiarity with this new technology on the part of chemists and managers in the field of
chemistry. This is true from academia to government to industry. In working to locate
supporting facts for this article I heard this belief mentioned a number of times. Phrases
like you need to "raise a generation of people who are comfortable with these tools" [2],
and "raise a generation of advocates" [3] came from professionals in the field of market
research. Thus I concluded that wide-scale and heavy use of computers by chemists has
not yet started.
At present, the routine use of computers in support of research and production in
a chemistry lab or office, other than for word processing, spreadsheets, and literature
searching, is low [4] (defined as less than 25% of the potential users). Why is this the
case? There are a few hundred thousand chemists in the USA and many of them have
computers. It is generally thought that most (>90%) of these individuals have
computers, virtually of all which are IBM PC's and clones or Apple Macintoshes, the
remainder having Sun, Silicon Graphics, DEC or other manufacturers of workstations.
It has not possible to obtain any definitive information on the number of scientists or
chemists with PC's. Marketing surveys have not addressed such questions [5].
If one combines these numbers with those in other developed countries one could
estimate there is today about 800,000 chemists [5] as a potential market for various
computers and computer systems and for software specifically designed to support the
needs of the chemist. What I hope this paper will address is why, if there are more
chemists today compared to 30 to 40 years ago, why is the use of chemical information by
end users (fewer books bought, fewer CA subscriptions, etc.) less. Logic would dictate
that with more chemists and more information there should be more access and use. Since
this is not the case, what is likely to help bring the usage back to a logical or reasonable
level, based on the size of the audience? This paper will examine some possible reasons
why large numbers of chemists have not yet decided that computers are a necessary tool
for conduct of their everyday research and administrative work, thus explaining the lack
of extensive use of computers and related computer technology.
Please note that when the qualitative phrases "few" or "low" (as defined in
reference 4) are mentioned for the overall use of a particular piece of computer, a
computer program, or a computerized database, the phrase is meant in comparison to the
overall potential purchase and use by some 800,000 potential users (chemists) worldwide.
With the exception of one series of marketing studies in the area of computational
chemistry [3], there have, to date, been no published studies of the use of computers and
related computer systems by the end users in the chemical community. (Studies on the
use of computers and databases in libraries are not regarded here as end-user studies.)
The reason for this is the small size of the current market which does not justify the
investment for such a survey [2,3]. Thus the reader will have to accept the lack of hard
statistics for many of the statements presented here.
While selling a total (over the lifetime of the program) of a few hundred or even a
few thousand molecular modeling or structure drawing programs is, today, a major
accomplishment in the business of software for chemistry, it is a minor event relative to
the daily sales of word processing, database management, spreadsheets, and other such
programs. The lack of any public software companies in chemistry (i.e., software
companies devoted exclusively to selling software for chemistry and whose stock is
available to the public) is indirect evidence to support this position that there is, at
present, no major financial incentive to go into this business.
Before proceeding to the main thrust of the speculations into the future of
computers in chemistry it is important to note that there are some labs as well as areas of
chemistry in which the use of computers is very high. As mentioned above, the area of
computational chemistry is clearly one of these areas. While it is estimated there are 1000
sites worldwide with some 2000 academic and industrial chemists now involved in this
area [3, page 4], this is less than 1% of the chemists in the world. In almost all areas of
spectroscopy computers are heavily used to acquire and analyze data. A reader involved
in these areas of chemistry would certainly not fall within the "low" range of computer
use. However, these "pockets" of high computer use, when averaged with the entire
chemistry community, I believe are consistent with the levels of usage stated here.
COMPUTER AND CHEMICAL INFORMATION ISSUES
Table 1 summarizes both the issues which are to be discussed here as well as the
current and predicted level of activities in these areas. Space in this journal does not
permit a full analysis of all of these topics. Thus a few representative issues will be
mentioned. Tables 2-12 list details for many of these issues.
Topic | Today | 2000 |
Computer Literacy | Low-Moderate | Moderate - High |
Computer Chip technology | Intel 386,486;
Motorola 68000; RISC |
Intel 986; Motorola 98000;
RISC |
Operating Systems | DOS, UNIX, Windows,
OS/2, Macintosh |
Mostly enhanced, friendly, UNIX |
Telecommunications | Moderate usage
2400-9600 baud speeds |
Heavy usage
1 million++ baud speeds |
Interfaces | Offensive/exacting | Transparent/Voice based |
Graphics | Low usage in most software | Predominant usage in most software |
CD-ROM | Low end-user usage | High end-user usage |
Chemical Information | Raw & unprocessed | Processed & analyzed |
Online usage for chemistry | Low | Low |
SDI | Manual or by post | Electronic |
Databases | Bibliographic | Numeric & factual |
Beilstein | E-V Series being published | E-V Series still being published |
Chemical catalogs | Online searching of catalogs | Online ordering from catalogs |
Chemical Identification | CAS RN & BRN | Chemical structure |
Molecular Modeling | Few | Some |
Educational software | Random; not integrated with textbooks | Integrated with textbooks |
Publishing | Semi-Electronic | Mostly Electronic |
Books | Thought of as probable dinosaurs | Thought of as probable dinosaurs |
Instruments | Semi-automated | Fully-automated with ISO data transfer standards |
Today -
Networks being used routinely by many chemists. BITNET, CSNET, EARNET,
Internet, JANET, NORDUNET, SPAN, and other networks used by scientists a few times
per week. Some companies have internal networks for many of their end-user PC's.
Telecommunications speeds in the range of 2400 - 9600 baud.
2000 -
Networks and e-mail used all of the time. Automatic interfacing between all
networks routine. Automatic logins for mail done everyday before the scientist comes to
work. E-mail automatically re-routed as you travel to meetings, holidays, and home.
Local-area and wide-area networks are widely available within most organizations.
Large databases more readily available within organizations. Telecommunications speeds
in the range of 2.4 million + baud.
Today -
Programs in their infancy.
2000 -
Voice control for input with lots of graphics. Standards for graphics and data are
common. IUPAC, CODATA, ASTM, ISO, and other organizations agree on data
transfer protocols.
Today -
Usage in its infancy.
Lack of compatibility.
Lack of standards.
FAX transmission in its infancy.
2000 -
Graphics software packages are widespread.
Graphics routinely sent electronically.
(Microsoft Chart)
PC's have built in FAX's for receiving and transmitting
chemical structures and tables of data.
Today -
Chemistry CD-ROM products are rare today.
Low density (600 MB) CD's.
e.g., Aldrich MSDS, Beilstein Current Facts, Canadian Toxicity Databases, NIST Mass Spectrometry, CAS 12th Collective Index, C&H - Dictionary of
Natural Products
2000 -
New products and high density CD-ROM's (6 Billion + Bytes)
Heilbron Dictionary of Organic Chemicals
CRC Handbook, CAS volume(s) on CD-ROM
Subsets of CAS, Beilstein, & Gmelin
Collections of numeric databases
(e.g., IR, NMR, & MS databases from NIST & Chemical Concepts)
Most Journals
Today -
Most information is raw, unprocessed, and un-evaluated.
CAS, Beilstein, VINITI - most abstracting and data extraction is done in-house
2000 -
Greater reliance on processed and evaluated data, such as Beilstein,
Gmelin,IUPAC data series, CRC Handbooks.
CAS, Beilstein, VINITI - economic factors will cause most abstracting and data
extraction to be done by free-lance workers at home. Articles and abstracts all sent
electronically from abstractor to abstracting service.
Today -
Popular feature for vendors. Results mailed to
customers or left for online downloading. [6]
2000 -
Popular feature for vendors. Results automatically sent
electronically to customers' PC via networks. Customers can order SDI articles of
interest electronically.
Today -
Still in the age of Bibliographic databases.
2000 -
Second generation of databases - numeric and factual data overtake bibliographic
databases in usage. Usage increases as scientists realize need for (good) data for
dry lab work (modeling, etc.)
Today -
Lots of printed catalogs. A few catalogs on disk or CD-ROM (e.g. Aldrich and
Kodak)
2000 -
Catalogs on CD-ROM's. Users order directly over the phone from their labs.
Ordering by credit card is routine.
Today -
CAS Registry Number reigns supreme.
2000 -
With chemical structures in all important databases, special identification numbers
have little use. Standard molecular data formats allow for interfacing between all
public and private files.
Today -
Random Usage. Software used in teaching high school and college chemistry does
not come with textbooks, but as separate products.
2000 -
Software integrated into textbooks [7]
(G. D. Wiggins, "Chemical Information Sources" [8])
PC floppy disk programs part of all undergraduate texts.
Chemical Information courses have PC based tutorials and practical online sessions.
Today -
Journal articles are almost the only socially acceptable form of communication and
reward/promotion. Some scientific manuscripts submitted in electronic form, but process
is neither widespread or practical. Virtually all refereeing done by postal system mail,
with some done by FAX.
2000 -
Printed journals still predominate, but electronic data submissions, electronic
journals, software programs are now part of academic, government, and industrial
chemist's reward/promotion system.
Leading journal publishers use electronic submissions to speed up processing of
publications, easier data extraction, and overall quality improvement.
Electronic (FAX and e-mail) peer review predominates.
The heart of the matter is computer literacy. Growing up with, being familiar
with, and making regular use of computers and computer systems of information will not
become the norm without the necessary atmosphere and background being part of your
upbringing. As mentioned in the introduction, the initial use of computers by chemists
(and other scientists) was limited to performing simple calculations. Hence it is no
surprise that the area of chemistry in which computers have been used is primarily
computational chemistry. But the usage even in this area is low. As Casale and Gelin [3]
point out, "as a scientific discipline, computational chemistry is in its infancy". The
current state of education in many parts of the world will make further usage difficult.
However I would hope that in college and graduate school there would be sufficient
competence to train the upcoming generation of chemists to become very familiar with
computers, through the introduction of computer application courses taught by chemists
in chemistry departments. Without an increase in the level of computer literacy the
remaining issues are pretty much irrelevant.
There are two facets in using computers; writing programs and using programs.
The writing of programs is really a rather limited issue. A computer is a tool. When a
chemists gets too involved in the tool then he or she is, more often than not, no longer
doing chemistry. What matters is using programs. To do this effectively and properly
you need to know what a computer can do for you in the area in which you need to solve
a problem. I don't need to be an automotive engineer to know that to get somewhere by
car, I need a car, and need to know how to drive it, and know where I am going. The
same is true with computers. Understanding what a computer can, or cannot, do is the
important step. Then either finding software and hardware to do it, or getting someone
to produce what is needed to get the job done, is relatively simple. I believe that virtually
no chemists use computers as an end in themselves and that chemists should use
computers as one of many tools to do their job, but only if the computer is the most
effective way, and not a barrier, to do the job better, more effectively, and more
efficiently..
Most chemists use computers for only administrative purposes (like writing a manuscript which may or may not include chemical diagrams). I would argue that the reason for the lack of extensive use of computers is that the majority of computers (PC's of the 8086, 8088, 286, and 386 vintage) which are readily available to the chemist are of insufficient capacity and capability to do effective work other than word processing, structure drawing, and spreadsheet calculations. (Without the available computers moreover, there has been no incentive to develop the software for chemists.) Until just very recently the computers with the necessary cpu speed (e.g., 486 cpu PC's, DEC, Sun, Silicon Graphics, and other type workstations) and available disk space to do a variety of scientific applications (modeling, quantum chemistry calculations, spectral interpretation and prediction, database searching of spectral data, image analysis, etc.) were much too expensive for most individual chemists to have on their desks or in their labs. As little as 2 years ago a computer with an Intel 286 cpu and 40 MB hard disk was considered a state-of-the-art computer system. Almost nobody with a PC would keep a mass spectral database [9] and search system, requiring some 23 MB of hard disk space on a computer system with a 40 MB hard disk. Today, to run a modern PC operating system (using DOS 5 and the Windows or OS/2 operating systems) one needs at least an Intel 486 cpu (with a 50-66 Mhz clock) and 300-500 MB of disk space. A recent article from a monthly computer magazine [10] added up the disk storage requirements for a little over a dozen pieces of popular business oriented software and the total of the disk space required came to almost 100 MB, not including any of the disk space required for program swapping or any of the space needed for files of data and information.
In the next few years chemists will be able to replace existing low power (e.g., 286
or equivalent type of PC) computers or buy new ones with the computer power of a an
Intel 486/586 (Pentium) cpu (or their Sun, Silicon Graphics, DEC, or equivalent) with
sufficient disk storage space to readily run complex and powerful programs.
The low usage of computers by chemists in the recent few years may be
attributable to the lack of affordable adequate hardware, but it will take a number of
years for this new and more powerful hardware to work its way into the system and into
everyday use. Furthermore, unless software prices follow those of hardware, it is
difficult to believe that many chemists will pay $2000 for a computer system and then
spend thousands of dollars for additional software packages. Only low cost, high volume
software is likely to succeed in the future. An experiment in mass marketing to the
chemistry community is now being undertaken by Autodesk, which is hoping to increase
the number of scientists using PC-based molecular modeling packages from a world-wide
total of 5,000 to 100,000 or more [3]. Included in Autodesk's effort is a, multi-million
dollar, grant program for encouraging university use of their HyperChem molecular
modeling software product.
TELECOMMUNICATIONS, NETWORKS, AND E-MAIL
Computers are used for electronic communication by a small, but growing number of chemists. Among the reasons for the low usage are the lack of modems and dedicated phone lines as well as the difficulty in finding where people are located or information is located and initiating communication. There is also the lack of computer addressees on the necessary computer networks (Internet, BITNET, MCInet, Sprintnet, Compuserve, etc.) and the problem of connecting between networks.
With computer networks, there is no readily available phone book, no operator. While work on this problem is moving forward it is still years away from being there.
Today, practically all numbers (actually computer network addresses) are unlisted.
However I can see changes coming. A few years ago a business card had a name, title,
address, and phone number. Today many business cards have FAX and Internet
addresses. This is part of computer literacy. This is progress. I believe it will still take
years for chemists to make routine use of Internet and the related networks connected to
it. From discussions with a number of people, the estimated usage of Internet by
chemists was in the range of 10-15% of those who have a computer and can access
computers outside their organizations [4]. Perhaps with the new political administration
in Washington DC promoting e-mail addresses on Internet (e.g., Bill Clinton can be
reached at PRESIDENT@WHITE-HOUSE.GOV) computer literacy will move forward a
bit faster.
Electronic mail or e-mail is slowly (due to a lack of knowledge about it) becoming a
new type of network for chemists, as well as other scientists. [11]. It is not an "old-boy"
network or "invisible college" because it allows anyone accessing these system to be an
"equal" of anyone else on the system. Electronic bulletin boards, discussion groups, and
news groups, dealing with all subjects, are slowly sprouting up everywhere in all areas,
each with dozens to hundreds of users. Of almost 800 such news group surveyed by
Kovacs, less than 100 are in the physical sciences and of these only 20 are in chemistry
[12]. Again a small percentage is observed when the topic relates to chemistry. For
some examples of Chemistry list-servers or news groups see page of Heller's recent paper
[13].
In a 1990 survey [14] it was estimated that some 10% of the overall working
population in the USA and Canada uses e-mail systems, vs, only 1.3% in Europe. The
ACS Computers in Chemistry (COMP) Division now distributes its newsletter via e-mail
on Internet, as well as hard-copy. In mid-1992 a little less than 10% of the COMP
members received the newsletter electronically [15], a number comparable with the
survey mentioned above. By 1994 this survey estimated the usage in North America
would grow to almost 29%, while in Europe the usage would expand to just under 5%.
Certainly there are cultural differences between those two areas in the use of the
telephone and modems, but the European PTT's and their policies add to the difficulty of
use. I would expect the e-mail usage in chemistry to be higher today, and that e-mail will
become a necessity by the end of this decade. The low cost, ease of use, and ability
readily to send written information to colleagues around the world make this an ideal
replacement for the existing "old-boy" network of phone calls, meetings, and letters. For
anyone, from a Nobel prize winner to a undergraduate student, to be able to
communicate freely and easily, and to see what topics and areas are of current interest
should improve scientific communication and research work. E-mail will be able to
reduce the time for papers to be sent back and forth. Once the graphics problem is
solved (both the technical standard for graphics and the speed of transmission of
graphics) and put into practical use, e-mail will allow for real electronic journals [16].
This area has a great and important future for chemists throughout the world. For more
details about networking, e-mail and electronic publishing please refer to the paper by
Garson in these proceedings [17].
INTERFACES
Another major problem with computer programs is the difficulty associated with
their use. Pacman and Nintendo (the popular video games of the 1980's and early 1990's)
never came with manuals. Some manuals seem more designed for weight lifting than
explaining how to use a particular computer program. Rarely are manuals available in
computer readable form [18]. In computerized form manuals could be capable of being
searching for a word in which you are interested in finding. Installing and running
programs is a major energy barrier for most people. My philosophy is that if I must read
the manual to use the computer program, I probably am better off without it. There is
no way someone can become and remain proficient with a wide variety of programs,
remembering what each does and how to perform particular tasks, as well as doing their
assigned job as a chemist. Few people use their VCR's to record TV shows because they
can't figure out how to do it. This even created a market for a device which
automatically sets up the VCR to record based on a set of 5 digits you type into a device.
The 5 digits are published in newspapers in the USA everyday next to each TV program
listing.
Table 1 speaks of today's interfaces as being frightful and difficult. If people are
not comfortable with a tool they will not use it. As computers become more powerful
and better software engineers graduate and get jobs, one can only hope and expect that
the interfaces in the year 2000 will become transparent and even voice based [19]. One
way to accomplish this is through the extended use of graphics in computers. Today the
use of high resolution graphics (1024 x 1024 pixels) is low. Color screen size is small (12 -
14 inches) and expensive. By the year 2000 I would expect that every computer will have
a 20 inch or larger color monitor with at least 2048 x 2048 resolution, along with a color
laser printer or plotter with the same capabilities. With the cost of hardware decreasing
this equipment should be available to most scientists in the coming decade. Related to the
problem of the need high-resolution for graphics is the problem of how to transmit all the
information quickly enough to be of practical use. Today's modem speeds of 2400 - 9600
baud are much too slow for graphics to be practical. With the current trend of better
networks and telecommunications, it seems reasonable to believe that the speeds of
transmissions needed for chemistry graphics will be available in the next few years.
CHEMICAL IDENTIFICATION AND STANDARDS
In the area of chemical identification it has taken some 20 years for the CAS
(Chemical Abstracts Service) Registry Number (CAS RN) to be used widely and routinely
in databases and in searching for chemicals. While the CAS RN now reigns supreme for
chemical identification, it suffers from the lack of any inherent intellectual value; it is,
like the US Social Security number, an idiot number (notwithstanding its check digit),
assigned sequentially over time. A larger number just means it entered the CAS Registry
system more recently. In the past few years optical scanning devices, coupled with
advances in character and vector recognition have led to the development of computer
programs (see, for example the work of Johnson, et. al [20]) which are able to scan
articles from the scientific literature (or from internal research reports), extract chemical
information, including connection tables of chemical structures and chemical reaction
data (such as solvent, temperature of reaction, etc.).
In spite of the wide use of the CAS RN in chemistry, and particularly in chemical
regulation by the EPA (Environmental Protection Agency) [21], chemical names are also
still very widely used for administrative and regulatory purposes. In fact the recently
developed AUTONOM program [22] was initially conceived for internal processing at the
Beilstein Institute for their handbook and database work. AUTONOM takes most
(>75%) [23] chemical structures and creates an IUPAC approved name for that chemical
structure. Its administrative value for internal and regulatory purposes is such that it is
now a commercial package [24]. Thus while there will be a need for chemical names and
registry numbers, the primary need will not be a scientific one.
The ease of creating large databases of chemical structures, along with the efforts
underway to create standard molecular data descriptions of molecules (e.g., the SMD
(Standard Molecular Data) [25] and STAR (Self-defining Text Archive and Retrieval) [26]
projects) and the increased ability to send large volumes of data over networks at high
speeds, make it seem reasonable to predict that the use of the CAS RN for searching for a
chemical will decrease over time. One of the major drawbacks of the CAS RN (and the
Beilstein Registry Number as well) is the lack of these numbers in the private, and
generally, confidential files of companies. It is not possible to use an internal
identification number to search public files and vice versa. Only the chemical structure
itself, when used as the "search term" will be a practical way to see if a chemical is in
another database. As different organizations represent their structures slightly
differently, only the advent of a standard molecular representation or an interchange
program (such as the recently developed program ConSystant [27]) will allow a user to
readily search for related structures in another database of chemical structures.
RAW VS. PROCESSED INFORMATION
Most of the data and information in the major chemical databases of the world are
raw and unprocessed. The two largest collections, those of CAS [28] and VINITI [29] are
bibliographic. In these two databases, whatever the author says is accepted at face value.
Since almost all of the papers abstracted are refereed, either the author's abstract is used
or CAS or VINITI write an abstract, based on the information provided in the
publication. Only the Beilstein and Gmelin databases perform some measure of
evaluation, although most of this work is really extraction of information. For example,
in the Beilstein Handbook, online database, and CD-ROM Current Facts, the data are
extracted. In the past there were some additional efforts made to assure that enough
information was published in the original work to guarantee the work could be
reproduced, and not every chemical reaction or piece of data was used by Beilstein. The
Beilstein staff has never had the financial capability to evaluate the very large volume of
data they process. Such data evaluation is rare, with the most well known example being
that of the US NIST/SRD (National Institute for Standards and Technology, Standard
Reference Data Program). With the possible exception of the Mass Spectrometry
database [9], none of the NIST databases are of any significant size in comparison to the
numbers of chemicals for which there is data available. Beilstein performs a "second"
and valuable peer review, albeit too late to keep questionable or poorly defined or
unexplained science from being published. In any event, today, due to the high costs of
labor, both the Beilstein Institute and VINITI have fewer in-house staff than in past years
and rely more on part-time outside workers. With some 65% of the costs at CAS being
labor, it is reasonable to believe that CAS will be moving in this direction again. (CAS
once had predominately all of the abstracting done by outside chemists.)
CD-ROM
CD-ROM's are computer hardware devices that are just beginning to find use in
chemistry. Again the problem of the lack of good software, adequate computer
hardware, and available databases has limited the growth and use of this medium.
While most of the educational science libraries in the USA have CD-ROM drives it is
estimated that currently perhaps some 1% of the computers which are in chemistry labs
and libraries have a CD-ROM drive [30]. This estimate has been supported by a non-scientific, non-systematic request for information which the author sent to the
approximately 400 subscribers to the Chemical Information News Group (see above)
resulted in two responses, one from Exxon and one from Rutgers (Chemistry and Physics
Departments) [31]. In both cases these information specialists who replied indicated they
know of no end users in their organizations who had CD-ROM's on their PC's. In
addition to this survey, a number of vendors of CD-ROM's were contacted. All
considered the sales and types of users to be confidential information. None kept track of
the type of users who were buying their products. In one case, that of the Aldrich
Chemical catalog on CD-ROM it was learned that while the sales (at $25 per CD-ROM)
of their catalog on CD-ROM were under 1000, they publish 2.7 million copies of their
printed catalog [32] and distributed free of charge. Even in an area where computers are
used more routinely in chemistry, namely computational chemistry, less than 10% of the
customers using the molecular modeling software program SYBYL have requested to
receive their software update on the CD-ROM offered by the vendor. This could be
compared to the computer science community, where more than 80% of the users of Sun
workstations receive their software and documentation on CD-ROM. Thus chemists
clearly have a long way to go before they become as comfortable with this medium as
computer programmers and computer systems staff are. At present I would consider the
state of the chemists' use of CD-ROM as in its infancy, but I strongly feel, with proper
pricing, that the growth curve for CD-ROM usage is likely to be exponential in the
coming years, as evidenced by the use of CD-ROM in other fields, where prices are quite
low (and volume is high). With the expected price of an internal (one that fits in the same
space slot as a PC floppy disk drive) CD-ROM dropping to under $200 in 1994 usage
should increase. Perhaps some bright marketing person will discover that offering a
"free" CR-DOM drive with the purchase of their product will do wonders to overcome the
current energy barrier to buy a CD-ROM.
CD-ROM's, which today store about 660 million characters (about 330,000 pages
of text), will, by the year 2000, replace many reference books and chemical catalogs on
the chemists' bench and bookshelf. A few pioneers in this area, such as the Beilstein
Institute in Frankfurt, Germany are leading the way to what will clearly be the library of
the future. The Beilstein Current Facts CD-ROM has about one year of extracted data
from the literature (without author names, titles, or abstracts), along with a computer
chemical structure search system, all neatly collected on a single CD-ROM. Someday, the
weekly issue of Chemical Abstracts will come to each chemist this way. Each chemist will
have the Merck Index, CRC Handbook of Chemistry and Physics, ACS Directory of
Graduate Research, and a few ACS journals, all on CD-ROM's. By the year 2000 it
should be possible to custom order a set of books on CD-ROM. For example, the ACS
Symposium Series of several hundred books could be entered into computer readable
form and then books "printed" on a CD-ROM on demand, the same way floppy disks are
copied today. Using keywords or phrases you could select a set of books you might want
on your bookshelf (actually your CD-ROM jukebox device), and send the order for such a
disk to be mastered and mailed to you. Certainly custom made orders would be more
expensive than pre-packaged ones, but, if marketed and priced favorably, should be well
within the means of most chemists. Groups of chemists, such as the polymer or materials
chemists could create their own CD-ROM's based on existing volumes already printed.
IUPAC could create a CD-ROM of Pure and Applied Chemistry. The list is almost
endless.
All that is needed for CD-ROM to become widely used and for most every chemist
to have his or her own private library (the was it was in those days of yesteryear) is
reasonable pricing. $2000 - $5000 or more for a CD-ROM (from Silver Platter, CRC, or
Chapman & Hall) is not likely to get many customers. The ACS has just released the
Directory of Graduate Research (DGR) on CD-ROM [33]. The cost is twice that of the
hard copy version. The reason for doubling the cost is said to be that the product is
"much more valuable". The DGR even has a limit on the number of hits you can print
out, for fear their mailing list business will be adversely impacted by being able to print
of 700 names and addresses from this CD-ROM. At present one can only print out 50
names and addresses at a time (and of rather poor quality at that - names are in inverted
ordered with some things capitalized, etc.)
The current mentality of "this is more valuable, so let's charge more" needs to be
replaced by "this is more valuable, let's reduce the price and sell a lot more". "High
volume" seems to be a phase which has been genetically engineered out of the minds of
those selling CD-ROM products in chemistry. One might wonder what the price of a PC
would be today and how many would be sold if PC's were priced like CD-ROM chemistry
products.
ELECTRONIC BOOKS AND JOURNALS
The last specific topic to be covered in this paper is the area of books, journals,
and online chemical information. In the online area it can be seen from the current usage
of scientific and technical databases, the current generation of chemists is not very
familiar with computers and chemical information. The costs of searching the chemical
literature (including the various charges of connect time, search hits, printouts, and so
forth) are high, averaging well over $100 per hour connected to a host main-frame
computer. Compared to browsing through a book, journal, or an issue of the printed
Chemical Abstracts, this is expensive. Most of the information is not evaluated. The
details of the chemical synthesis method or the properties of a molecule or material are
either not in the abstract or need to be found by reading the journal article or book
chapter. With high fixed expenses in the creation of the information, due to the fact that
abstracting and indexing is and, I believe, will always be a very labor intensive effort
(even with such expected developments as the potentially useful software of Johnson, et.
al. [20]), there are two ways to recover the costs; either charge a lot of people small sums
of money or charge a few people a lot of money. The chemical information industry, for
the most part (and there are a few exceptions), has decided to opt for high prices. The
results are what most would expect. Few of the hundreds of thousands of chemists
referred to in the beginning of this article use computerized databases. Few subscribe to
weekly literature searching (Selective Dissemination of Information - SDI) of online
databases. The reason is primarily economic. Schools and even many companies cannot
afford to have hundreds of chemists spending such large sums of money on literature and
related online searching. Hopefully some of the database and vendor companies will
begin to experiment with the notion of marketing to the thousands of potential users
waiting for reasonably priced products. Years ago many people had personal
subscriptions to sections of Chemical Abstracts, to journals, and so on. Will the
computer revolution in general and CD-ROM's in particular cause history to come full
circle? I believe by the year 2000 this is a distinct possibility if there are changes in the
way in which vendors market their products. While books will never disappear from the
chemist's desk, I think CD-ROM will become the preferred medium of distribution and
use in many areas of chemical information. These areas include reference works,
collections of books and articles on a particular subject, as well as chemical catalogs of
supplies, software, and software updates.
As for computer-based journals, as stated in Table 13, publishing in a printed
journal is now the socially accepted means of communication and leads to rewards and
promotions. While the means of communication can easily change, the social reward
situation is quite different. Universities and most other organizations which have peer
review, use refereed scientific journals very heavily in their evaluation criteria. While I
feel my career has not suffered due to the software and databases I have written and
developed, I do not think this is the usual case. Experiments in journals which have a
substantial portion of their activity in non-hard copy form are now starting to appear.
One such case was Tetrahedron Computer Methodology (TCM) [34]. This journal died
after some four years, owing to a variety of technical and nontechnical reasons. A new
partly online journal, the Online Journal of Current Clinical Trials (OJCCT), has just
finally gotten off the ground and is now available [35, 36]. This journal has more
institutional support than TCM, and so it may make inroads in this area. Additionally
there is another new journal, Protein Science [37], which is a biochemistry journal which
started publishing in January 1992. Protein Science comes with a floppy disk of graphics,
which the journal calls "kinemages". In any event, I can see that these experiments,
coupled with better delivery mechanisms (for chemistry this is primarily software for the
transmission and viewing of graphics), will by the end of the decade lead to a few journals
making real headway towards the chemical community having automated journals.
Additional examples of electronic publishing activities (journals, electronic libraries and
so forth) and publishing experiments can be found in a recent article by Borman [38].
ECONOMIC ISSUES [39]
The recent (and perhaps ongoing) recession in a number of developed countries of
the world has led to the re-examination of how to sell products. When people don't fly,
airlines lower their fares to fill seats. When people don't buy automobiles, General
Motors, Ford, and Chrysler, along with foreign car companies lower the prices to
stimulate sales. When hotels have occupancy rates below 50% and need 65% occupancy
to at least break-even financially, hotels offer cheap rooms. There are many more
examples outside the chemical information area, but it should suffice to state that the
Japanese domination of the consumer electronics industry clearly shows that lower prices
lead to higher volumes and generally higher profits. Examples in chemical information, I
need only to mention such publications as the 11th edition of the Merck Index [40] (priced
at $30) or the CRC Handbook of Chemistry and Physics [41] (priced at $100), now in its
73rd edition. Both of these products sell tens of thousands of copies.
In chemical information there seems to be a pervasive attitude that information is
valuable and prices must be high. Information is no doubt valuable, as evidenced by
state and corporate intelligence gathering. However the notion that because something is
computer readable or in electronic form it MUST be priced higher than the
corresponding printed product is probably not the way to increase usage. Eastman
Kodak has recently released it chemical catalog on a floppy disk (similar to the CD-ROM
Aldrich chemical catalog mentioned earlier in this article.) Some highly paid Kodak
employee decided to charge $20 for a copy (vs. a free multi-hundred page printed
catalog). Given the current cost of postage it is certain that the cost of printing and
mailing the catalog is more expensive than the sending a floppy disk. Why then charge
for the disk?
A recent front page Wall Street Journal article on the Electronic Campus [42] talked about what the future may be like in the publishing industry and in libraries in the next decade. The article described how most textbook publishers are not moving into the electronic age very quickly, if at all. One electronic product mentioned in the article described a CD-ROM about Greek and other ancient civilizations. The CD-ROM contains 25 volumes of Greek text (with english translation), a Greek dictionary, some 6000 photos,and drawings of artifacts. All this sells for $120, much less than the price of the corresponding printed products. This sort of entrepreneurial effort is likely to result in increased sales of such materials and is likely to be what lies in the future. Clearly, as the article points out, textbooks are getting so expensive (and are usually heavier than a portable computer containing much more information) that students
are buying fewer textbooks. Those books they buy are often at used book stores (where
the publisher does not get a royalty) since the cost of books is now so high.
In 1978 the total annual online information (scientific and non-scientific (primarily
legal) information) revenues were about $40 million [43]. By 1990 this had grown to an
annual rate of $690 million. The most successful computer chemistry software company,
Molecular Design Ltd (MDL) of San Leandro, California, in roughly the same period of
time has seen revenues go from $0 to about $50 million per year. Molecular modeling
companies, of which there are at least a half dozen, together probably have total annual
revenues of less than current MDL sales. (Sales revenues are based on software sales and
exclude hardware and consulting/consortium groups.) Compared with other industries
and especially compared to other areas of the computer industry, these revenues are
rather low and these are not impressive figures [44]. I would hope that companies in this
field will begin to experiment with new marketing approaches which will both increase
the usage of their products and reach a larger segment of the chemistry population. The
Autodesk effort with its HyperChem software is one bright example in a gloomy field.
Without a greater volume of usage it is possible that information will remain a commodity
for only a small portion of the chemical community.
To reinforce what was said above with regard to pricing of CD-ROM products in
chemistry, one of the best summaries of this matter of the economic problems and pricing
for which the chemical information community has not be able to deal with was recently
reported by Harry Collier who stated [45]:
"After over 20 years, it appears to us that confusion still reigns because too few people in
this branch of the information business have a realistic assessment of what their market
is. 'Oh, yes,' they will say on Monday, 'we have a high-value product which we sell
mainly to a specialist niche marketplace.'
'But also.' they will say on Tuesday, 'we would like additionally to reach a market of
thousands of eager end-users and expand our usage. And we also like to cater for
impecunious academics, for under-developed nations and for small users.'"
SUMMARY
I believe there are two main reasons for the low use of computers and computer
systems by chemists - cost and ease of use. The economics of chemical information, up to
this point in time, made it a tool for few users and the wealthy in the more developed
nations of the world and for the more wealthy companies in those countries. More easy
to use computer systems will begin to generate more usage. This will lead, slowly, to
lower individual pricing schemes. This will happen in spite of the current marketing
policies of most electronic chemical information providers. This should, in turn, really
begin the age of lower individual computerized chemical information costs. (The classic
chicken and the egg situation.) I believe that with the current and upcoming generation
of hardware with the power of an Intel 486/Pentium (at 50-300 Mhz) or an equivalent
UNIX-based Sun or DEC or Silicon Graphics workstation, software can be designed and
implemented which will have two main features. The software will be reasonably easy to
use (and be easy to remember the next day or week as to how to use a program or
database system being accessed) and powerful enough to do the actual job needed to be
done. By powerful I mean that the software will have the necessary "user-friendly"
interfaces (graphics, mouse, voice command, and so on), and have some AI (artificial
intelligence) capability and knowledge of the subject to assist the end use in getting his or
her job done. However, without close cooperation between software developers and
database producers and their end users, this will not happen. Both the software and
databases need to be properly designed to meet the actual end-user needs, not the needs
which the vendors perceive the users have . Talking to, and more importantly, listening
to, the customer or end user, is something the chemical information and related industry
will have to come to grips with in the next few years if real and substantial progress is to
be made for both parties.
Computers and the related technology described in this article hold the potential
promise that by the 21st century more chemical information and computer systems will be
available to the entire world-wide community. With larger numbers of users this should
allow the costs of the products being developed to be spread across a much wider number
of people, leading to higher usage, higher productivity and lower costs for all computer
related products.
ACKNOWLEDGEMENTS
The author wishes to acknowledge numerous colleagues who have provided
information and comments on this paper. Most of these people are specifically
acknowledged in the references.
REFERENCES
[1] Based on a lectures given at the 10th International Conference on Computers in
Chemical Research and Education (ICCCRE), Jerusalem, Israel, July 1992 and the
Second International Conference on Computer Applications to Materials and Molecular
Science and Engineering, Yokohama, Japan, September 1992.
[2] Tom Greeves, Daratech Inc., 140 6th Street, Cambridge, MA 02142. Phone: 617-354-2339.
[3] Casale, Charles T. and Gelin, Bruce R., "Growth and Opportunity in Computational
Chemistry", 1992, The Aberdeen Group,, 92 State Street, Boston, MA. Phone: 617-723-7890, FAX: 617-723-7897. There is also a more extensive 1989 report published by the
same organization entitled "Conflicting Trends in Computational Chemistry".
[4] Ouchi, G., Brego Research, private communication. R. Venkataraghavan, Lederle
Labs, private communication, T. Pierce, Rohm & Haas, private communication. J.
Witiak, Rohm & Haas, private communication. H. Woodruff, Merck Labs, private
communication. Low usage is defined as 0-25%, moderate usage at 26-49%, high as 50-75%, and very high as 76-100%.
[5] Attempts to find evidence or even an estimate of the number of chemists who have
PC's has proven futile. The few firms that have done market surveys in this field or for
computer sales in general, such as the Aberdeen group [reference 3], as well as Daratech,
Inc., Dataquest, and International Data Corporation, had no information, nor did they
have any idea where to get such information.
[6] In October 1992 STN began to deliver SDI results electronically, but only to an
STNmail ID. STNews, 8, #10, page 1, October 1992, North American Edition.
[7] In the PC computer field a regular flow of books come with computer disks. These
disks are intimately related to the contents of the book. For example, a disk of DOS
enhancement programs comes with the Dvorak book on DOS and PC performance.
Dvorak, J. C. and Anis, N., "Dvorak's Inside Track to DOS & PC Performance", ISBN:
0-07-881759-5, $39.95. Osborne McGraw-Hill, 1992. 2600 Tenth Street, Berkeley, CA
94710. Phone: 510-549-6600, FAX: 510549-6603.
[8] Wiggins, G. D., "Chemical Information Sources", McGraw-Hill Series in Advanced
Chemistry, ISBN: 0-07-909939-4, McGraw-Hill, New York, 1991. This book includes a
"Chemistry Reference Sources Database" of 2156 records plus the Pro-Cite search
software for IBM PC's. Pro-Cite is available from Personal Bibliographic Software, PO
Box 4250, Ann Arbor, MI 48106-4250. Phone: 313-996-1580; FAX: 313-996-4672.
[9] PC version of the NIST/EPA/NIH Mass Spectral Database, March 1992 Version.
Available from NIST/SRD, Bldg 221/A320, Gaithersburg, MD 20899 (Phone: 301-975-2208; FAX: 301-926-0416). The price is $1200 for the database or $200 for those who had
bought previous versions.
[10] PC World, page 210, August 1992.
[11]. Heller, S. R., "The Future of Chemical Information Activities", J. Chem. Inf.
Comput. Sci., 33, 284-291(1993).
[12] One list of directories of academic e-mail conferences is available from the Kent
State University file server. It was developed by Diane K. Kovacs and is copyright. It is
available , via Internet, by ftp (file transfer protocol) from KSUVXA.KENT.EDU. The
lists of directories are in the LIBRARY directory of the computer and are titled:
ACADLIST.FILEx, where x is 1 to 7, depending on your area of interest (physical
sciences, biological sciences, etc.). File7 is the file containing the news groups in the
Physical sciences and mathematics).
[13] There are new user groups being formed all the time. However it remains to be seen
how will it will take for these to actually take hold and have staying power. Springer-Verlag, the company that distributes the Beilstein database started a system for its users
of the Beilstein database on the Compuserve computer system. After a year it was found
that usage was too low to continue it. In the past few months user group conferences on
HyperChem software, CHARMm software, Organic Chemistry (actually a restart of a
system that died a few years ago), Amber software, BioSym software, and SYBYL
software have been started up. IUPAC hopes to initiate one in the near future for its
many members and affiliates around the world. It will be interesting to see how many of
these remain and in what form they remain in 1-2 years.
[14] BIS Strategic Decisions Global Electronic Messaging Service, 1991.
[15] Pierce, T., Rohm & Haas, private communication.
[16] Meadows, A. J., and Buckle, P., "Changing Communication Activities in the British
Scientific Community", J. Doc., 48, 276-290 (1992).
[17] Garson, L., "Data Design Issues in Creating Electronic Products for Primary
Chemical Information", Proceedings of the Annecy Conference, pages xxx-xxx (1993).
[18] The WordPerfect Corporation makes its WordPerfect manuals available on disk,
which can be searched for words using any word processor.
[19] Calem, R. E., "Coming Soon: The PC With Ears", New York Times, Business
Section, page F9, August 30, 1992.
[20] Ibison, P., Johnson, A. P., Kam, F., Neville, A. G., Simpson, R., Tonnelier, R.,
and Venczel T., "Automatic Extraction of Chemical Information from the Literature",
page 25, Abstracts of the 10th ICCCRE, Jerusalem, Israel, July 1992.
[21] EPA Order 2880.2, "Use of Chemical Abstracts Service Registry Data in ADP
Systems", June 30, 1975.
[22] Goebels, L., Lawson, A. J., and Wisniewski, J. L. , "AUTONOM: System for
Computer Translation of Structural Diagrams into IUPAC-Compatible Names.
Nomenclature of Chains and Rings", J. Chem. Inf. Comput. Sci., 31, 216-225 (1991).
Wisniewski, J. L., "AUTONOM: System for Computer Translation of Structural
Diagrams into IUPAC-Compatible Names. 1. General Design", J. Chem. Inf. Comput.
Sci., 30, 314-332 (1990).
[23] The figure of 75% refers to chemicals published in the current organic chemistry
literature. The program does not handle stereochemistry, charged species, inorganics,
peptides, or sugars. A. J. Lawson, private communication.
[24] AUTONOM, is an IBM-PC software program and costs $ 1980 (industry price) or
$980 (academic price). It is available from Springer-Verlag Publishers, 175 Fifth Avenue,
NY 10010. Phone: 212-460-1622, FAX: 212-533-5781.
[25] Barnard, J. M., "Draft Specification for Revised Version of the Standard Molecular
Data (SMD) Format", J. Chem. Inf. Comput. Sci., 30, 81-96 (1990).
[26] Hall, S. R., "The STAR File: A New Format for Electronic Data Transfer and
Archiving", J. Chem. Inf. Comput. Sci., 31, 326-333 (1991).
[27] ConSystant is an IBM PC-based DOS program available for $199 from
ExoGraphics, PO Box 655, West Milford, NJ 07480-0655. Phone: 201-728-0188, FAX
201-728-0735.
[28] The American Chemical Society established the Chemical Abstracts Service in 1907.
The printed Chemical Abstracts and the related Chemical Abstracts databases are
available from CAS, 2540 Olentangy River Road, PO Box 3012, Columbus, OH 43210-0012. Phone: 614-447-3600, FAX: 614-447-3713. At present there are some 9.5 million
abstracts in the CA computer readable database and about 11.5 million chemical
structures with CAS REGN, in the structure file associated with the bibliographic
database. There are some 17.5 million names associated with the 11.5 million structures.
[29] VINITI, The All Russian Institute of Scientific and Technical Information, was
established in 1952. Since that time it has collected over 31 million source documents in
all areas of science (not just chemistry). Of these there are some 11 million abstracts in
computer readable form. Its main publication is "Referativnyi Zhurnal VINITI". VINITI
is located at 20a Uslevitcha Street, Moscow 125219, Russia. Phone: 011-7-095-152-6163,
FAX: 011-7-095-943-0060. VINITI distributes its products outside of Russia through
Access Innovations, Inc. 4314 Mesa Grande S.E., Albuquerque, NM 87108. Phone 505-265-3591, FAX: 505-256-1080.
[30] Badger, R., Springer-Verlag, New York, private communication. Gary Wiggins,
Chemistry Library, Indiana University, private communication.
[31] D. Johnson, Exxon Corporation, private communication. H. Dess, Rutgers
University/Chemistry & Physics Library, private communication.
[32] Publication Department, Aldrich Chemical Company, 940 West St. Paul Avenue,
Milwaukee, WI 53233. Phone: 414-273-3850.
[33] Directory of Graduate Research, CD-ROM edition, ACS Books, ACS, 1155 16th
Street, NW, Washington DC, 20036 (1993). Phone: 202-872-4600.
[34] Tetrahedron Computer Methodology (TCM), was published by Pergamon Press
(UK) between 1988 and 1992.
[35] The Online Journal of Current Clinical Trials (OJCCT), a joint venture of the
American Association for the Advancement of Science (AAAS) and the OCLC Online
Computer Library Center, Inc. The price is $95 plus monthly telecommunication charges.
For further information contact the journal at 1333 H Street, NW, Washington, DC
20005. Phone: 202-326-6446.
[36] On 28 August 1992 it was announced (Science, Volume 257, 4 September 1992, page
1341) that OJCCT [35] has linked up with the journal "The Lancet" so that "The Lancet"
could publish a printed, abridged form of a current OJCCT article.
[37] Borman. S., C&E News, pages 26-27, February 17, 1992.
[38] Borman, S. "Advances in Electronic Publishing Herald Changes for Scientists",
C&E News, pages 10-24, June 14, 1993.
[39] Heller, S. R., "The Economic Future of Numeric Databases in Chemistry ",
Proceedings of the 15th International Online Information Meeting, London, December
1991, pages 47 - 50. Published by Learned Information, Oxford, UK.
[40] The Merck Index, 11th Edition, S. Budavari, Editor, Merck & Co. Inc., Rahway,
NJ 07065-0900 USA, 1989 (Phone: 908-594-4904).
[41] CRC Handbook of Chemistry and Physics, 73rd Edition, D. Lide, Editor, CRC
Press Inc., 2000 Corporate Blvd., N.W., Boca Raton, FL 33431 USA, 1992 (Phone: 407-994-0555, FAX: 407-994-3625).
[42] Cox, M., "Electronic Campus - Technology Threatens to Shatter the World of
College Textbooks", Wall St. Journal, page 1, June 1, 1993.
[43] Williams, M., "Highlights of the Online Database Industry", pages 1-4, Proceedings
of the National Online Meeting, New York, May 1992. Published by Learned
Information Inc., 143 Old Marlton Pike Medford, NJ 08055 (Phone: 609-654-6226, FAX:
609-654-4309).
[44] For example, the sales figure of $50 million for MDL sales after 10 years can be
compared with that of Lotus Development Corporation, which was $53 million in its first
year (1983). To date over 9 million copies of Lotus 1-2-3 have been sold. Barron's
magazine, September 14, 1992, page 12.
[45] Collier, H., Monitor, #148, pages 2-3, June 1993.