Gazing into the Future of Chemical Information Activities




Stephen R. Heller
USDA, ARS
Beltsville, MD 20705-2350 USA
SRHELLER@ASRR.ARSUSDA.GOV


INTRODUCTION

This presentation [1] is designed to stimulate discussion of new computer-based technology which is now available and which will become available to chemists, applied to chemistry, and most importantly, used by chemists in their everyday activities from today to well into the beginning of the next century. Starting at a presentation at a EUSIDIC meeting in Heidelberg in 1988, this paper has evolved over the past five years, and no doubt will continue to evolve as new phenomena stimulate changes in the habits and activities of chemists.

As computer technology has developed, the use of computers in chemistry has expanded from simple arithmetic calculations to very broad areas of chemistry. This paper delves into some of these areas and tries to summarize the current state of the use of computers in chemistry and what the author believes the use of computers in the field of chemistry will be a little over decade from now, which is roughly, from the years 2005 - 2010.

BACKGROUND

Computers, like any other technological tool, have become integrated gradually into the daily routine of chemists. The widespread use of computers in chemistry has clearly been handicapped by a number of factors, a major one being the lack of familiarity with this new technology on the part of chemists and managers in the field of chemistry. This is true from academia to government to industry. In working to locate supporting facts for this article I heard this belief mentioned a number of times. Phrases like you need to "raise a generation of people who are comfortable with these tools" [2], and "raise a generation of advocates" [3] came from professionals in the field of market research. Thus I concluded that wide-scale and heavy use of computers by chemists has not yet started.

At present, the routine use of computers in support of research and production in a chemistry lab or office, other than for word processing, spreadsheets, and literature searching, is low [4] (defined as less than 25% of the potential users). Why is this the case? There are a few hundred thousand chemists in the USA and many of them have computers. It is generally thought that most (>90%) of these individuals have computers, virtually of all which are IBM PC's and clones or Apple Macintoshes, the remainder having Sun, Silicon Graphics, DEC or other manufacturers of workstations. It has not possible to obtain any definitive information on the number of scientists or chemists with PC's. Marketing surveys have not addressed such questions [5].

If one combines these numbers with those in other developed countries one could estimate there is today about 800,000 chemists [5] as a potential market for various computers and computer systems and for software specifically designed to support the needs of the chemist. What I hope this paper will address is why, if there are more chemists today compared to 30 to 40 years ago, why is the use of chemical information by end users (fewer books bought, fewer CA subscriptions, etc.) less. Logic would dictate that with more chemists and more information there should be more access and use. Since this is not the case, what is likely to help bring the usage back to a logical or reasonable level, based on the size of the audience? This paper will examine some possible reasons why large numbers of chemists have not yet decided that computers are a necessary tool for conduct of their everyday research and administrative work, thus explaining the lack of extensive use of computers and related computer technology.

Please note that when the qualitative phrases "few" or "low" (as defined in reference 4) are mentioned for the overall use of a particular piece of computer, a computer program, or a computerized database, the phrase is meant in comparison to the overall potential purchase and use by some 800,000 potential users (chemists) worldwide. With the exception of one series of marketing studies in the area of computational chemistry [3], there have, to date, been no published studies of the use of computers and related computer systems by the end users in the chemical community. (Studies on the use of computers and databases in libraries are not regarded here as end-user studies.) The reason for this is the small size of the current market which does not justify the investment for such a survey [2,3]. Thus the reader will have to accept the lack of hard statistics for many of the statements presented here.

While selling a total (over the lifetime of the program) of a few hundred or even a few thousand molecular modeling or structure drawing programs is, today, a major accomplishment in the business of software for chemistry, it is a minor event relative to the daily sales of word processing, database management, spreadsheets, and other such programs. The lack of any public software companies in chemistry (i.e., software companies devoted exclusively to selling software for chemistry and whose stock is available to the public) is indirect evidence to support this position that there is, at present, no major financial incentive to go into this business.

Before proceeding to the main thrust of the speculations into the future of computers in chemistry it is important to note that there are some labs as well as areas of chemistry in which the use of computers is very high. As mentioned above, the area of computational chemistry is clearly one of these areas. While it is estimated there are 1000 sites worldwide with some 2000 academic and industrial chemists now involved in this area [3, page 4], this is less than 1% of the chemists in the world. In almost all areas of spectroscopy computers are heavily used to acquire and analyze data. A reader involved in these areas of chemistry would certainly not fall within the "low" range of computer use. However, these "pockets" of high computer use, when averaged with the entire chemistry community, I believe are consistent with the levels of usage stated here.



COMPUTER AND CHEMICAL INFORMATION ISSUES

Table 1 summarizes both the issues which are to be discussed here as well as the current and predicted level of activities in these areas. Space in this journal does not permit a full analysis of all of these topics. Thus a few representative issues will be mentioned. Tables 2-12 list details for many of these issues.



Table 1

Issues for Discussion

Topic Today 2000
Computer Literacy Low-Moderate Moderate - High
Computer Chip technology Intel 386,486;

Motorola 68000; RISC

Intel 986; Motorola 98000;

RISC

Operating Systems DOS, UNIX, Windows,

OS/2, Macintosh

Mostly enhanced, friendly, UNIX
Telecommunications Moderate usage

2400-9600 baud speeds

Heavy usage

1 million++ baud speeds

Interfaces Offensive/exacting Transparent/Voice based
Graphics Low usage in most software Predominant usage in most software
CD-ROM Low end-user usage High end-user usage
Chemical Information Raw & unprocessed Processed & analyzed
Online usage for chemistry Low Low
SDI Manual or by post Electronic
Databases Bibliographic Numeric & factual
Beilstein E-V Series being published E-V Series still being published
Chemical catalogs Online searching of catalogs Online ordering from catalogs
Chemical Identification CAS RN & BRN Chemical structure
Molecular Modeling Few Some
Educational software Random; not integrated with textbooks Integrated with textbooks
Publishing Semi-Electronic Mostly Electronic
Books Thought of as probable dinosaurs Thought of as probable dinosaurs
Instruments Semi-automated Fully-automated with ISO data transfer standards




Table 2

Telecommunications/Networks


Today -

Networks being used routinely by many chemists. BITNET, CSNET, EARNET, Internet, JANET, NORDUNET, SPAN, and other networks used by scientists a few times per week. Some companies have internal networks for many of their end-user PC's. Telecommunications speeds in the range of 2400 - 9600 baud.

2000 -

Networks and e-mail used all of the time. Automatic interfacing between all networks routine. Automatic logins for mail done everyday before the scientist comes to work. E-mail automatically re-routed as you travel to meetings, holidays, and home.

Local-area and wide-area networks are widely available within most organizations. Large databases more readily available within organizations. Telecommunications speeds in the range of 2.4 million + baud.











Table 3

Interfaces

Today -

Programs in their infancy.

2000 -

Voice control for input with lots of graphics. Standards for graphics and data are common. IUPAC, CODATA, ASTM, ISO, and other organizations agree on data transfer protocols.







Table 4

Graphics

Today -

Usage in its infancy.

Lack of compatibility.

Lack of standards.

FAX transmission in its infancy.

2000 -

Graphics software packages are widespread.

Graphics routinely sent electronically.

(Microsoft Chart)

PC's have built in FAX's for receiving and transmitting

chemical structures and tables of data.







Table 5

CD-ROM


Today -

Chemistry CD-ROM products are rare today.

Low density (600 MB) CD's.

e.g., Aldrich MSDS, Beilstein Current Facts, Canadian Toxicity Databases, NIST Mass Spectrometry, CAS 12th Collective Index, C&H - Dictionary of Natural Products

2000 -

New products and high density CD-ROM's (6 Billion + Bytes)

Heilbron Dictionary of Organic Chemicals

CRC Handbook, CAS volume(s) on CD-ROM

Subsets of CAS, Beilstein, & Gmelin

Collections of numeric databases

(e.g., IR, NMR, & MS databases from NIST & Chemical Concepts)

Most Journals





Table 6

Chemical Information


Today -

Most information is raw, unprocessed, and un-evaluated.

CAS, Beilstein, VINITI - most abstracting and data extraction is done in-house



2000 -

Greater reliance on processed and evaluated data, such as Beilstein, Gmelin,IUPAC data series, CRC Handbooks.

CAS, Beilstein, VINITI - economic factors will cause most abstracting and data extraction to be done by free-lance workers at home. Articles and abstracts all sent electronically from abstractor to abstracting service.







Table 7

SDI


Today -

Popular feature for vendors. Results mailed to

customers or left for online downloading. [6]

2000 -

Popular feature for vendors. Results automatically sent

electronically to customers' PC via networks. Customers can order SDI articles of interest electronically.

Table 8

Databases

Today -

Still in the age of Bibliographic databases.

2000 -

Second generation of databases - numeric and factual data overtake bibliographic databases in usage. Usage increases as scientists realize need for (good) data for dry lab work (modeling, etc.)





Table 9

Chemical Catalogs




Today -

Lots of printed catalogs. A few catalogs on disk or CD-ROM (e.g. Aldrich and Kodak)

2000 -

Catalogs on CD-ROM's. Users order directly over the phone from their labs. Ordering by credit card is routine.







Table 10

Chemical Identification




Today -

CAS Registry Number reigns supreme.

2000 -

With chemical structures in all important databases, special identification numbers have little use. Standard molecular data formats allow for interfacing between all public and private files.











Table 11

Educational Software


Today -

Random Usage. Software used in teaching high school and college chemistry does not come with textbooks, but as separate products.

2000 -

Software integrated into textbooks [7]
(G. D. Wiggins, "Chemical Information Sources" [8])
PC floppy disk programs part of all undergraduate texts.

Chemical Information courses have PC based tutorials and practical online sessions.



Table 12

Publishing


Today -

Journal articles are almost the only socially acceptable form of communication and reward/promotion. Some scientific manuscripts submitted in electronic form, but process is neither widespread or practical. Virtually all refereeing done by postal system mail, with some done by FAX.

2000 -

Printed journals still predominate, but electronic data submissions, electronic journals, software programs are now part of academic, government, and industrial chemist's reward/promotion system.

Leading journal publishers use electronic submissions to speed up processing of publications, easier data extraction, and overall quality improvement.

Electronic (FAX and e-mail) peer review predominates.


The heart of the matter is computer literacy. Growing up with, being familiar with, and making regular use of computers and computer systems of information will not become the norm without the necessary atmosphere and background being part of your upbringing. As mentioned in the introduction, the initial use of computers by chemists (and other scientists) was limited to performing simple calculations. Hence it is no surprise that the area of chemistry in which computers have been used is primarily computational chemistry. But the usage even in this area is low. As Casale and Gelin [3] point out, "as a scientific discipline, computational chemistry is in its infancy". The current state of education in many parts of the world will make further usage difficult. However I would hope that in college and graduate school there would be sufficient competence to train the upcoming generation of chemists to become very familiar with computers, through the introduction of computer application courses taught by chemists in chemistry departments. Without an increase in the level of computer literacy the remaining issues are pretty much irrelevant.

There are two facets in using computers; writing programs and using programs. The writing of programs is really a rather limited issue. A computer is a tool. When a chemists gets too involved in the tool then he or she is, more often than not, no longer doing chemistry. What matters is using programs. To do this effectively and properly you need to know what a computer can do for you in the area in which you need to solve a problem. I don't need to be an automotive engineer to know that to get somewhere by car, I need a car, and need to know how to drive it, and know where I am going. The same is true with computers. Understanding what a computer can, or cannot, do is the important step. Then either finding software and hardware to do it, or getting someone to produce what is needed to get the job done, is relatively simple. I believe that virtually no chemists use computers as an end in themselves and that chemists should use computers as one of many tools to do their job, but only if the computer is the most effective way, and not a barrier, to do the job better, more effectively, and more efficiently..

Most chemists use computers for only administrative purposes (like writing a manuscript which may or may not include chemical diagrams). I would argue that the reason for the lack of extensive use of computers is that the majority of computers (PC's of the 8086, 8088, 286, and 386 vintage) which are readily available to the chemist are of insufficient capacity and capability to do effective work other than word processing, structure drawing, and spreadsheet calculations. (Without the available computers moreover, there has been no incentive to develop the software for chemists.) Until just very recently the computers with the necessary cpu speed (e.g., 486 cpu PC's, DEC, Sun, Silicon Graphics, and other type workstations) and available disk space to do a variety of scientific applications (modeling, quantum chemistry calculations, spectral interpretation and prediction, database searching of spectral data, image analysis, etc.) were much too expensive for most individual chemists to have on their desks or in their labs. As little as 2 years ago a computer with an Intel 286 cpu and 40 MB hard disk was considered a state-of-the-art computer system. Almost nobody with a PC would keep a mass spectral database [9] and search system, requiring some 23 MB of hard disk space on a computer system with a 40 MB hard disk. Today, to run a modern PC operating system (using DOS 5 and the Windows or OS/2 operating systems) one needs at least an Intel 486 cpu (with a 50-66 Mhz clock) and 300-500 MB of disk space. A recent article from a monthly computer magazine [10] added up the disk storage requirements for a little over a dozen pieces of popular business oriented software and the total of the disk space required came to almost 100 MB, not including any of the disk space required for program swapping or any of the space needed for files of data and information.

In the next few years chemists will be able to replace existing low power (e.g., 286 or equivalent type of PC) computers or buy new ones with the computer power of a an Intel 486/586 (Pentium) cpu (or their Sun, Silicon Graphics, DEC, or equivalent) with sufficient disk storage space to readily run complex and powerful programs.

The low usage of computers by chemists in the recent few years may be attributable to the lack of affordable adequate hardware, but it will take a number of years for this new and more powerful hardware to work its way into the system and into everyday use. Furthermore, unless software prices follow those of hardware, it is difficult to believe that many chemists will pay $2000 for a computer system and then spend thousands of dollars for additional software packages. Only low cost, high volume software is likely to succeed in the future. An experiment in mass marketing to the chemistry community is now being undertaken by Autodesk, which is hoping to increase the number of scientists using PC-based molecular modeling packages from a world-wide total of 5,000 to 100,000 or more [3]. Included in Autodesk's effort is a, multi-million dollar, grant program for encouraging university use of their HyperChem molecular modeling software product.



TELECOMMUNICATIONS, NETWORKS, AND E-MAIL

Computers are used for electronic communication by a small, but growing number of chemists. Among the reasons for the low usage are the lack of modems and dedicated phone lines as well as the difficulty in finding where people are located or information is located and initiating communication. There is also the lack of computer addressees on the necessary computer networks (Internet, BITNET, MCInet, Sprintnet, Compuserve, etc.) and the problem of connecting between networks.

With computer networks, there is no readily available phone book, no operator. While work on this problem is moving forward it is still years away from being there.

Today, practically all numbers (actually computer network addresses) are unlisted. However I can see changes coming. A few years ago a business card had a name, title, address, and phone number. Today many business cards have FAX and Internet addresses. This is part of computer literacy. This is progress. I believe it will still take years for chemists to make routine use of Internet and the related networks connected to it. From discussions with a number of people, the estimated usage of Internet by chemists was in the range of 10-15% of those who have a computer and can access computers outside their organizations [4]. Perhaps with the new political administration in Washington DC promoting e-mail addresses on Internet (e.g., Bill Clinton can be reached at PRESIDENT@WHITE-HOUSE.GOV) computer literacy will move forward a bit faster.

Electronic mail or e-mail is slowly (due to a lack of knowledge about it) becoming a new type of network for chemists, as well as other scientists. [11]. It is not an "old-boy" network or "invisible college" because it allows anyone accessing these system to be an "equal" of anyone else on the system. Electronic bulletin boards, discussion groups, and news groups, dealing with all subjects, are slowly sprouting up everywhere in all areas, each with dozens to hundreds of users. Of almost 800 such news group surveyed by Kovacs, less than 100 are in the physical sciences and of these only 20 are in chemistry [12]. Again a small percentage is observed when the topic relates to chemistry. For some examples of Chemistry list-servers or news groups see page of Heller's recent paper [13].

In a 1990 survey [14] it was estimated that some 10% of the overall working population in the USA and Canada uses e-mail systems, vs, only 1.3% in Europe. The ACS Computers in Chemistry (COMP) Division now distributes its newsletter via e-mail on Internet, as well as hard-copy. In mid-1992 a little less than 10% of the COMP members received the newsletter electronically [15], a number comparable with the survey mentioned above. By 1994 this survey estimated the usage in North America would grow to almost 29%, while in Europe the usage would expand to just under 5%. Certainly there are cultural differences between those two areas in the use of the telephone and modems, but the European PTT's and their policies add to the difficulty of use. I would expect the e-mail usage in chemistry to be higher today, and that e-mail will become a necessity by the end of this decade. The low cost, ease of use, and ability readily to send written information to colleagues around the world make this an ideal replacement for the existing "old-boy" network of phone calls, meetings, and letters. For anyone, from a Nobel prize winner to a undergraduate student, to be able to communicate freely and easily, and to see what topics and areas are of current interest should improve scientific communication and research work. E-mail will be able to reduce the time for papers to be sent back and forth. Once the graphics problem is solved (both the technical standard for graphics and the speed of transmission of graphics) and put into practical use, e-mail will allow for real electronic journals [16]. This area has a great and important future for chemists throughout the world. For more details about networking, e-mail and electronic publishing please refer to the paper by Garson in these proceedings [17].



INTERFACES

Another major problem with computer programs is the difficulty associated with their use. Pacman and Nintendo (the popular video games of the 1980's and early 1990's) never came with manuals. Some manuals seem more designed for weight lifting than explaining how to use a particular computer program. Rarely are manuals available in computer readable form [18]. In computerized form manuals could be capable of being searching for a word in which you are interested in finding. Installing and running programs is a major energy barrier for most people. My philosophy is that if I must read the manual to use the computer program, I probably am better off without it. There is no way someone can become and remain proficient with a wide variety of programs, remembering what each does and how to perform particular tasks, as well as doing their assigned job as a chemist. Few people use their VCR's to record TV shows because they can't figure out how to do it. This even created a market for a device which automatically sets up the VCR to record based on a set of 5 digits you type into a device. The 5 digits are published in newspapers in the USA everyday next to each TV program listing.

Table 1 speaks of today's interfaces as being frightful and difficult. If people are not comfortable with a tool they will not use it. As computers become more powerful and better software engineers graduate and get jobs, one can only hope and expect that the interfaces in the year 2000 will become transparent and even voice based [19]. One way to accomplish this is through the extended use of graphics in computers. Today the use of high resolution graphics (1024 x 1024 pixels) is low. Color screen size is small (12 - 14 inches) and expensive. By the year 2000 I would expect that every computer will have a 20 inch or larger color monitor with at least 2048 x 2048 resolution, along with a color laser printer or plotter with the same capabilities. With the cost of hardware decreasing this equipment should be available to most scientists in the coming decade. Related to the problem of the need high-resolution for graphics is the problem of how to transmit all the information quickly enough to be of practical use. Today's modem speeds of 2400 - 9600 baud are much too slow for graphics to be practical. With the current trend of better networks and telecommunications, it seems reasonable to believe that the speeds of transmissions needed for chemistry graphics will be available in the next few years.



CHEMICAL IDENTIFICATION AND STANDARDS

In the area of chemical identification it has taken some 20 years for the CAS (Chemical Abstracts Service) Registry Number (CAS RN) to be used widely and routinely in databases and in searching for chemicals. While the CAS RN now reigns supreme for chemical identification, it suffers from the lack of any inherent intellectual value; it is, like the US Social Security number, an idiot number (notwithstanding its check digit), assigned sequentially over time. A larger number just means it entered the CAS Registry system more recently. In the past few years optical scanning devices, coupled with advances in character and vector recognition have led to the development of computer programs (see, for example the work of Johnson, et. al [20]) which are able to scan articles from the scientific literature (or from internal research reports), extract chemical information, including connection tables of chemical structures and chemical reaction data (such as solvent, temperature of reaction, etc.).

In spite of the wide use of the CAS RN in chemistry, and particularly in chemical regulation by the EPA (Environmental Protection Agency) [21], chemical names are also still very widely used for administrative and regulatory purposes. In fact the recently developed AUTONOM program [22] was initially conceived for internal processing at the Beilstein Institute for their handbook and database work. AUTONOM takes most (>75%) [23] chemical structures and creates an IUPAC approved name for that chemical structure. Its administrative value for internal and regulatory purposes is such that it is now a commercial package [24]. Thus while there will be a need for chemical names and registry numbers, the primary need will not be a scientific one.

The ease of creating large databases of chemical structures, along with the efforts underway to create standard molecular data descriptions of molecules (e.g., the SMD (Standard Molecular Data) [25] and STAR (Self-defining Text Archive and Retrieval) [26] projects) and the increased ability to send large volumes of data over networks at high speeds, make it seem reasonable to predict that the use of the CAS RN for searching for a chemical will decrease over time. One of the major drawbacks of the CAS RN (and the Beilstein Registry Number as well) is the lack of these numbers in the private, and generally, confidential files of companies. It is not possible to use an internal identification number to search public files and vice versa. Only the chemical structure itself, when used as the "search term" will be a practical way to see if a chemical is in another database. As different organizations represent their structures slightly differently, only the advent of a standard molecular representation or an interchange program (such as the recently developed program ConSystant [27]) will allow a user to readily search for related structures in another database of chemical structures.



RAW VS. PROCESSED INFORMATION

Most of the data and information in the major chemical databases of the world are raw and unprocessed. The two largest collections, those of CAS [28] and VINITI [29] are bibliographic. In these two databases, whatever the author says is accepted at face value. Since almost all of the papers abstracted are refereed, either the author's abstract is used or CAS or VINITI write an abstract, based on the information provided in the publication. Only the Beilstein and Gmelin databases perform some measure of evaluation, although most of this work is really extraction of information. For example, in the Beilstein Handbook, online database, and CD-ROM Current Facts, the data are extracted. In the past there were some additional efforts made to assure that enough information was published in the original work to guarantee the work could be reproduced, and not every chemical reaction or piece of data was used by Beilstein. The Beilstein staff has never had the financial capability to evaluate the very large volume of data they process. Such data evaluation is rare, with the most well known example being that of the US NIST/SRD (National Institute for Standards and Technology, Standard Reference Data Program). With the possible exception of the Mass Spectrometry database [9], none of the NIST databases are of any significant size in comparison to the numbers of chemicals for which there is data available. Beilstein performs a "second" and valuable peer review, albeit too late to keep questionable or poorly defined or unexplained science from being published. In any event, today, due to the high costs of labor, both the Beilstein Institute and VINITI have fewer in-house staff than in past years and rely more on part-time outside workers. With some 65% of the costs at CAS being labor, it is reasonable to believe that CAS will be moving in this direction again. (CAS once had predominately all of the abstracting done by outside chemists.)



CD-ROM

CD-ROM's are computer hardware devices that are just beginning to find use in chemistry. Again the problem of the lack of good software, adequate computer hardware, and available databases has limited the growth and use of this medium. While most of the educational science libraries in the USA have CD-ROM drives it is estimated that currently perhaps some 1% of the computers which are in chemistry labs and libraries have a CD-ROM drive [30]. This estimate has been supported by a non-scientific, non-systematic request for information which the author sent to the approximately 400 subscribers to the Chemical Information News Group (see above) resulted in two responses, one from Exxon and one from Rutgers (Chemistry and Physics Departments) [31]. In both cases these information specialists who replied indicated they know of no end users in their organizations who had CD-ROM's on their PC's. In addition to this survey, a number of vendors of CD-ROM's were contacted. All considered the sales and types of users to be confidential information. None kept track of the type of users who were buying their products. In one case, that of the Aldrich Chemical catalog on CD-ROM it was learned that while the sales (at $25 per CD-ROM) of their catalog on CD-ROM were under 1000, they publish 2.7 million copies of their printed catalog [32] and distributed free of charge. Even in an area where computers are used more routinely in chemistry, namely computational chemistry, less than 10% of the customers using the molecular modeling software program SYBYL have requested to receive their software update on the CD-ROM offered by the vendor. This could be compared to the computer science community, where more than 80% of the users of Sun workstations receive their software and documentation on CD-ROM. Thus chemists clearly have a long way to go before they become as comfortable with this medium as computer programmers and computer systems staff are. At present I would consider the state of the chemists' use of CD-ROM as in its infancy, but I strongly feel, with proper pricing, that the growth curve for CD-ROM usage is likely to be exponential in the coming years, as evidenced by the use of CD-ROM in other fields, where prices are quite low (and volume is high). With the expected price of an internal (one that fits in the same space slot as a PC floppy disk drive) CD-ROM dropping to under $200 in 1994 usage should increase. Perhaps some bright marketing person will discover that offering a "free" CR-DOM drive with the purchase of their product will do wonders to overcome the current energy barrier to buy a CD-ROM.

CD-ROM's, which today store about 660 million characters (about 330,000 pages of text), will, by the year 2000, replace many reference books and chemical catalogs on the chemists' bench and bookshelf. A few pioneers in this area, such as the Beilstein Institute in Frankfurt, Germany are leading the way to what will clearly be the library of the future. The Beilstein Current Facts CD-ROM has about one year of extracted data from the literature (without author names, titles, or abstracts), along with a computer chemical structure search system, all neatly collected on a single CD-ROM. Someday, the weekly issue of Chemical Abstracts will come to each chemist this way. Each chemist will have the Merck Index, CRC Handbook of Chemistry and Physics, ACS Directory of Graduate Research, and a few ACS journals, all on CD-ROM's. By the year 2000 it should be possible to custom order a set of books on CD-ROM. For example, the ACS Symposium Series of several hundred books could be entered into computer readable form and then books "printed" on a CD-ROM on demand, the same way floppy disks are copied today. Using keywords or phrases you could select a set of books you might want on your bookshelf (actually your CD-ROM jukebox device), and send the order for such a disk to be mastered and mailed to you. Certainly custom made orders would be more expensive than pre-packaged ones, but, if marketed and priced favorably, should be well within the means of most chemists. Groups of chemists, such as the polymer or materials chemists could create their own CD-ROM's based on existing volumes already printed. IUPAC could create a CD-ROM of Pure and Applied Chemistry. The list is almost endless.

All that is needed for CD-ROM to become widely used and for most every chemist to have his or her own private library (the was it was in those days of yesteryear) is reasonable pricing. $2000 - $5000 or more for a CD-ROM (from Silver Platter, CRC, or Chapman & Hall) is not likely to get many customers. The ACS has just released the Directory of Graduate Research (DGR) on CD-ROM [33]. The cost is twice that of the hard copy version. The reason for doubling the cost is said to be that the product is "much more valuable". The DGR even has a limit on the number of hits you can print out, for fear their mailing list business will be adversely impacted by being able to print of 700 names and addresses from this CD-ROM. At present one can only print out 50 names and addresses at a time (and of rather poor quality at that - names are in inverted ordered with some things capitalized, etc.)

The current mentality of "this is more valuable, so let's charge more" needs to be replaced by "this is more valuable, let's reduce the price and sell a lot more". "High volume" seems to be a phase which has been genetically engineered out of the minds of those selling CD-ROM products in chemistry. One might wonder what the price of a PC would be today and how many would be sold if PC's were priced like CD-ROM chemistry products.



ELECTRONIC BOOKS AND JOURNALS

The last specific topic to be covered in this paper is the area of books, journals, and online chemical information. In the online area it can be seen from the current usage of scientific and technical databases, the current generation of chemists is not very familiar with computers and chemical information. The costs of searching the chemical literature (including the various charges of connect time, search hits, printouts, and so forth) are high, averaging well over $100 per hour connected to a host main-frame computer. Compared to browsing through a book, journal, or an issue of the printed Chemical Abstracts, this is expensive. Most of the information is not evaluated. The details of the chemical synthesis method or the properties of a molecule or material are either not in the abstract or need to be found by reading the journal article or book chapter. With high fixed expenses in the creation of the information, due to the fact that abstracting and indexing is and, I believe, will always be a very labor intensive effort (even with such expected developments as the potentially useful software of Johnson, et. al. [20]), there are two ways to recover the costs; either charge a lot of people small sums of money or charge a few people a lot of money. The chemical information industry, for the most part (and there are a few exceptions), has decided to opt for high prices. The results are what most would expect. Few of the hundreds of thousands of chemists referred to in the beginning of this article use computerized databases. Few subscribe to weekly literature searching (Selective Dissemination of Information - SDI) of online databases. The reason is primarily economic. Schools and even many companies cannot afford to have hundreds of chemists spending such large sums of money on literature and related online searching. Hopefully some of the database and vendor companies will begin to experiment with the notion of marketing to the thousands of potential users waiting for reasonably priced products. Years ago many people had personal subscriptions to sections of Chemical Abstracts, to journals, and so on. Will the computer revolution in general and CD-ROM's in particular cause history to come full circle? I believe by the year 2000 this is a distinct possibility if there are changes in the way in which vendors market their products. While books will never disappear from the chemist's desk, I think CD-ROM will become the preferred medium of distribution and use in many areas of chemical information. These areas include reference works, collections of books and articles on a particular subject, as well as chemical catalogs of supplies, software, and software updates.

As for computer-based journals, as stated in Table 13, publishing in a printed journal is now the socially accepted means of communication and leads to rewards and promotions. While the means of communication can easily change, the social reward situation is quite different. Universities and most other organizations which have peer review, use refereed scientific journals very heavily in their evaluation criteria. While I feel my career has not suffered due to the software and databases I have written and developed, I do not think this is the usual case. Experiments in journals which have a substantial portion of their activity in non-hard copy form are now starting to appear. One such case was Tetrahedron Computer Methodology (TCM) [34]. This journal died after some four years, owing to a variety of technical and nontechnical reasons. A new partly online journal, the Online Journal of Current Clinical Trials (OJCCT), has just finally gotten off the ground and is now available [35, 36]. This journal has more institutional support than TCM, and so it may make inroads in this area. Additionally there is another new journal, Protein Science [37], which is a biochemistry journal which started publishing in January 1992. Protein Science comes with a floppy disk of graphics, which the journal calls "kinemages". In any event, I can see that these experiments, coupled with better delivery mechanisms (for chemistry this is primarily software for the transmission and viewing of graphics), will by the end of the decade lead to a few journals making real headway towards the chemical community having automated journals. Additional examples of electronic publishing activities (journals, electronic libraries and so forth) and publishing experiments can be found in a recent article by Borman [38].



ECONOMIC ISSUES [39]

The recent (and perhaps ongoing) recession in a number of developed countries of the world has led to the re-examination of how to sell products. When people don't fly, airlines lower their fares to fill seats. When people don't buy automobiles, General Motors, Ford, and Chrysler, along with foreign car companies lower the prices to stimulate sales. When hotels have occupancy rates below 50% and need 65% occupancy to at least break-even financially, hotels offer cheap rooms. There are many more examples outside the chemical information area, but it should suffice to state that the Japanese domination of the consumer electronics industry clearly shows that lower prices lead to higher volumes and generally higher profits. Examples in chemical information, I need only to mention such publications as the 11th edition of the Merck Index [40] (priced at $30) or the CRC Handbook of Chemistry and Physics [41] (priced at $100), now in its 73rd edition. Both of these products sell tens of thousands of copies.

In chemical information there seems to be a pervasive attitude that information is valuable and prices must be high. Information is no doubt valuable, as evidenced by state and corporate intelligence gathering. However the notion that because something is computer readable or in electronic form it MUST be priced higher than the corresponding printed product is probably not the way to increase usage. Eastman Kodak has recently released it chemical catalog on a floppy disk (similar to the CD-ROM Aldrich chemical catalog mentioned earlier in this article.) Some highly paid Kodak employee decided to charge $20 for a copy (vs. a free multi-hundred page printed catalog). Given the current cost of postage it is certain that the cost of printing and mailing the catalog is more expensive than the sending a floppy disk. Why then charge for the disk?

A recent front page Wall Street Journal article on the Electronic Campus [42] talked about what the future may be like in the publishing industry and in libraries in the next decade. The article described how most textbook publishers are not moving into the electronic age very quickly, if at all. One electronic product mentioned in the article described a CD-ROM about Greek and other ancient civilizations. The CD-ROM contains 25 volumes of Greek text (with english translation), a Greek dictionary, some 6000 photos,and drawings of artifacts. All this sells for $120, much less than the price of the corresponding printed products. This sort of entrepreneurial effort is likely to result in increased sales of such materials and is likely to be what lies in the future. Clearly, as the article points out, textbooks are getting so expensive (and are usually heavier than a portable computer containing much more information) that students

are buying fewer textbooks. Those books they buy are often at used book stores (where the publisher does not get a royalty) since the cost of books is now so high.

In 1978 the total annual online information (scientific and non-scientific (primarily legal) information) revenues were about $40 million [43]. By 1990 this had grown to an annual rate of $690 million. The most successful computer chemistry software company, Molecular Design Ltd (MDL) of San Leandro, California, in roughly the same period of time has seen revenues go from $0 to about $50 million per year. Molecular modeling companies, of which there are at least a half dozen, together probably have total annual revenues of less than current MDL sales. (Sales revenues are based on software sales and exclude hardware and consulting/consortium groups.) Compared with other industries and especially compared to other areas of the computer industry, these revenues are rather low and these are not impressive figures [44]. I would hope that companies in this field will begin to experiment with new marketing approaches which will both increase the usage of their products and reach a larger segment of the chemistry population. The Autodesk effort with its HyperChem software is one bright example in a gloomy field. Without a greater volume of usage it is possible that information will remain a commodity for only a small portion of the chemical community.

To reinforce what was said above with regard to pricing of CD-ROM products in chemistry, one of the best summaries of this matter of the economic problems and pricing for which the chemical information community has not be able to deal with was recently reported by Harry Collier who stated [45]:

"After over 20 years, it appears to us that confusion still reigns because too few people in this branch of the information business have a realistic assessment of what their market is. 'Oh, yes,' they will say on Monday, 'we have a high-value product which we sell mainly to a specialist niche marketplace.'

'But also.' they will say on Tuesday, 'we would like additionally to reach a market of thousands of eager end-users and expand our usage. And we also like to cater for impecunious academics, for under-developed nations and for small users.'"



SUMMARY

I believe there are two main reasons for the low use of computers and computer systems by chemists - cost and ease of use. The economics of chemical information, up to this point in time, made it a tool for few users and the wealthy in the more developed nations of the world and for the more wealthy companies in those countries. More easy to use computer systems will begin to generate more usage. This will lead, slowly, to lower individual pricing schemes. This will happen in spite of the current marketing policies of most electronic chemical information providers. This should, in turn, really begin the age of lower individual computerized chemical information costs. (The classic chicken and the egg situation.) I believe that with the current and upcoming generation of hardware with the power of an Intel 486/Pentium (at 50-300 Mhz) or an equivalent UNIX-based Sun or DEC or Silicon Graphics workstation, software can be designed and implemented which will have two main features. The software will be reasonably easy to use (and be easy to remember the next day or week as to how to use a program or database system being accessed) and powerful enough to do the actual job needed to be done. By powerful I mean that the software will have the necessary "user-friendly" interfaces (graphics, mouse, voice command, and so on), and have some AI (artificial intelligence) capability and knowledge of the subject to assist the end use in getting his or her job done. However, without close cooperation between software developers and database producers and their end users, this will not happen. Both the software and databases need to be properly designed to meet the actual end-user needs, not the needs which the vendors perceive the users have . Talking to, and more importantly, listening to, the customer or end user, is something the chemical information and related industry will have to come to grips with in the next few years if real and substantial progress is to be made for both parties.

Computers and the related technology described in this article hold the potential promise that by the 21st century more chemical information and computer systems will be available to the entire world-wide community. With larger numbers of users this should allow the costs of the products being developed to be spread across a much wider number of people, leading to higher usage, higher productivity and lower costs for all computer related products.



ACKNOWLEDGEMENTS

The author wishes to acknowledge numerous colleagues who have provided information and comments on this paper. Most of these people are specifically acknowledged in the references.



REFERENCES

[1] Based on a lectures given at the 10th International Conference on Computers in Chemical Research and Education (ICCCRE), Jerusalem, Israel, July 1992 and the Second International Conference on Computer Applications to Materials and Molecular Science and Engineering, Yokohama, Japan, September 1992.

[2] Tom Greeves, Daratech Inc., 140 6th Street, Cambridge, MA 02142. Phone: 617-354-2339.

[3] Casale, Charles T. and Gelin, Bruce R., "Growth and Opportunity in Computational Chemistry", 1992, The Aberdeen Group,, 92 State Street, Boston, MA. Phone: 617-723-7890, FAX: 617-723-7897. There is also a more extensive 1989 report published by the same organization entitled "Conflicting Trends in Computational Chemistry".

[4] Ouchi, G., Brego Research, private communication. R. Venkataraghavan, Lederle Labs, private communication, T. Pierce, Rohm & Haas, private communication. J. Witiak, Rohm & Haas, private communication. H. Woodruff, Merck Labs, private communication. Low usage is defined as 0-25%, moderate usage at 26-49%, high as 50-75%, and very high as 76-100%.

[5] Attempts to find evidence or even an estimate of the number of chemists who have PC's has proven futile. The few firms that have done market surveys in this field or for computer sales in general, such as the Aberdeen group [reference 3], as well as Daratech, Inc., Dataquest, and International Data Corporation, had no information, nor did they have any idea where to get such information.

[6] In October 1992 STN began to deliver SDI results electronically, but only to an STNmail ID. STNews, 8, #10, page 1, October 1992, North American Edition.

[7] In the PC computer field a regular flow of books come with computer disks. These disks are intimately related to the contents of the book. For example, a disk of DOS enhancement programs comes with the Dvorak book on DOS and PC performance. Dvorak, J. C. and Anis, N., "Dvorak's Inside Track to DOS & PC Performance", ISBN: 0-07-881759-5, $39.95. Osborne McGraw-Hill, 1992. 2600 Tenth Street, Berkeley, CA 94710. Phone: 510-549-6600, FAX: 510549-6603.

[8] Wiggins, G. D., "Chemical Information Sources", McGraw-Hill Series in Advanced Chemistry, ISBN: 0-07-909939-4, McGraw-Hill, New York, 1991. This book includes a "Chemistry Reference Sources Database" of 2156 records plus the Pro-Cite search software for IBM PC's. Pro-Cite is available from Personal Bibliographic Software, PO Box 4250, Ann Arbor, MI 48106-4250. Phone: 313-996-1580; FAX: 313-996-4672.

[9] PC version of the NIST/EPA/NIH Mass Spectral Database, March 1992 Version. Available from NIST/SRD, Bldg 221/A320, Gaithersburg, MD 20899 (Phone: 301-975-2208; FAX: 301-926-0416). The price is $1200 for the database or $200 for those who had bought previous versions.

[10] PC World, page 210, August 1992.

[11]. Heller, S. R., "The Future of Chemical Information Activities", J. Chem. Inf. Comput. Sci., 33, 284-291(1993).

[12] One list of directories of academic e-mail conferences is available from the Kent State University file server. It was developed by Diane K. Kovacs and is copyright. It is available , via Internet, by ftp (file transfer protocol) from KSUVXA.KENT.EDU. The lists of directories are in the LIBRARY directory of the computer and are titled: ACADLIST.FILEx, where x is 1 to 7, depending on your area of interest (physical sciences, biological sciences, etc.). File7 is the file containing the news groups in the Physical sciences and mathematics).

[13] There are new user groups being formed all the time. However it remains to be seen how will it will take for these to actually take hold and have staying power. Springer-Verlag, the company that distributes the Beilstein database started a system for its users of the Beilstein database on the Compuserve computer system. After a year it was found that usage was too low to continue it. In the past few months user group conferences on HyperChem software, CHARMm software, Organic Chemistry (actually a restart of a system that died a few years ago), Amber software, BioSym software, and SYBYL software have been started up. IUPAC hopes to initiate one in the near future for its many members and affiliates around the world. It will be interesting to see how many of these remain and in what form they remain in 1-2 years.

[14] BIS Strategic Decisions Global Electronic Messaging Service, 1991.

[15] Pierce, T., Rohm & Haas, private communication.

[16] Meadows, A. J., and Buckle, P., "Changing Communication Activities in the British Scientific Community", J. Doc., 48, 276-290 (1992).

[17] Garson, L., "Data Design Issues in Creating Electronic Products for Primary Chemical Information", Proceedings of the Annecy Conference, pages xxx-xxx (1993).

[18] The WordPerfect Corporation makes its WordPerfect manuals available on disk, which can be searched for words using any word processor.

[19] Calem, R. E., "Coming Soon: The PC With Ears", New York Times, Business Section, page F9, August 30, 1992.

[20] Ibison, P., Johnson, A. P., Kam, F., Neville, A. G., Simpson, R., Tonnelier, R., and Venczel T., "Automatic Extraction of Chemical Information from the Literature", page 25, Abstracts of the 10th ICCCRE, Jerusalem, Israel, July 1992.

[21] EPA Order 2880.2, "Use of Chemical Abstracts Service Registry Data in ADP Systems", June 30, 1975.

[22] Goebels, L., Lawson, A. J., and Wisniewski, J. L. , "AUTONOM: System for Computer Translation of Structural Diagrams into IUPAC-Compatible Names. Nomenclature of Chains and Rings", J. Chem. Inf. Comput. Sci., 31, 216-225 (1991). Wisniewski, J. L., "AUTONOM: System for Computer Translation of Structural Diagrams into IUPAC-Compatible Names. 1. General Design", J. Chem. Inf. Comput. Sci., 30, 314-332 (1990).

[23] The figure of 75% refers to chemicals published in the current organic chemistry literature. The program does not handle stereochemistry, charged species, inorganics, peptides, or sugars. A. J. Lawson, private communication.

[24] AUTONOM, is an IBM-PC software program and costs $ 1980 (industry price) or $980 (academic price). It is available from Springer-Verlag Publishers, 175 Fifth Avenue, NY 10010. Phone: 212-460-1622, FAX: 212-533-5781.

[25] Barnard, J. M., "Draft Specification for Revised Version of the Standard Molecular Data (SMD) Format", J. Chem. Inf. Comput. Sci., 30, 81-96 (1990).

[26] Hall, S. R., "The STAR File: A New Format for Electronic Data Transfer and Archiving", J. Chem. Inf. Comput. Sci., 31, 326-333 (1991).

[27] ConSystant is an IBM PC-based DOS program available for $199 from ExoGraphics, PO Box 655, West Milford, NJ 07480-0655. Phone: 201-728-0188, FAX 201-728-0735.

[28] The American Chemical Society established the Chemical Abstracts Service in 1907. The printed Chemical Abstracts and the related Chemical Abstracts databases are available from CAS, 2540 Olentangy River Road, PO Box 3012, Columbus, OH 43210-0012. Phone: 614-447-3600, FAX: 614-447-3713. At present there are some 9.5 million abstracts in the CA computer readable database and about 11.5 million chemical structures with CAS REGN, in the structure file associated with the bibliographic database. There are some 17.5 million names associated with the 11.5 million structures.

[29] VINITI, The All Russian Institute of Scientific and Technical Information, was established in 1952. Since that time it has collected over 31 million source documents in all areas of science (not just chemistry). Of these there are some 11 million abstracts in computer readable form. Its main publication is "Referativnyi Zhurnal VINITI". VINITI is located at 20a Uslevitcha Street, Moscow 125219, Russia. Phone: 011-7-095-152-6163, FAX: 011-7-095-943-0060. VINITI distributes its products outside of Russia through Access Innovations, Inc. 4314 Mesa Grande S.E., Albuquerque, NM 87108. Phone 505-265-3591, FAX: 505-256-1080.

[30] Badger, R., Springer-Verlag, New York, private communication. Gary Wiggins, Chemistry Library, Indiana University, private communication.

[31] D. Johnson, Exxon Corporation, private communication. H. Dess, Rutgers University/Chemistry & Physics Library, private communication.

[32] Publication Department, Aldrich Chemical Company, 940 West St. Paul Avenue, Milwaukee, WI 53233. Phone: 414-273-3850.

[33] Directory of Graduate Research, CD-ROM edition, ACS Books, ACS, 1155 16th Street, NW, Washington DC, 20036 (1993). Phone: 202-872-4600.

[34] Tetrahedron Computer Methodology (TCM), was published by Pergamon Press (UK) between 1988 and 1992.

[35] The Online Journal of Current Clinical Trials (OJCCT), a joint venture of the American Association for the Advancement of Science (AAAS) and the OCLC Online Computer Library Center, Inc. The price is $95 plus monthly telecommunication charges. For further information contact the journal at 1333 H Street, NW, Washington, DC 20005. Phone: 202-326-6446.

[36] On 28 August 1992 it was announced (Science, Volume 257, 4 September 1992, page 1341) that OJCCT [35] has linked up with the journal "The Lancet" so that "The Lancet" could publish a printed, abridged form of a current OJCCT article.

[37] Borman. S., C&E News, pages 26-27, February 17, 1992.

[38] Borman, S. "Advances in Electronic Publishing Herald Changes for Scientists", C&E News, pages 10-24, June 14, 1993.

[39] Heller, S. R., "The Economic Future of Numeric Databases in Chemistry ", Proceedings of the 15th International Online Information Meeting, London, December 1991, pages 47 - 50. Published by Learned Information, Oxford, UK.

[40] The Merck Index, 11th Edition, S. Budavari, Editor, Merck & Co. Inc., Rahway, NJ 07065-0900 USA, 1989 (Phone: 908-594-4904).

[41] CRC Handbook of Chemistry and Physics, 73rd Edition, D. Lide, Editor, CRC Press Inc., 2000 Corporate Blvd., N.W., Boca Raton, FL 33431 USA, 1992 (Phone: 407-994-0555, FAX: 407-994-3625).

[42] Cox, M., "Electronic Campus - Technology Threatens to Shatter the World of College Textbooks", Wall St. Journal, page 1, June 1, 1993.

[43] Williams, M., "Highlights of the Online Database Industry", pages 1-4, Proceedings of the National Online Meeting, New York, May 1992. Published by Learned Information Inc., 143 Old Marlton Pike Medford, NJ 08055 (Phone: 609-654-6226, FAX: 609-654-4309).

[44] For example, the sales figure of $50 million for MDL sales after 10 years can be compared with that of Lotus Development Corporation, which was $53 million in its first year (1983). To date over 9 million copies of Lotus 1-2-3 have been sold. Barron's magazine, September 14, 1992, page 12.

[45] Collier, H., Monitor, #148, pages 2-3, June 1993.