A Computer-Based Chemical

Information System

Chemical data stored in a central computer can be

used internationally in real time and at low cost.

S. R. Heller, G. W. A. Milne, R. J. Feldmann

A considerable improvement in the power of computers in the 1960's prompted work at the National Institutes of Health (NIH) and at the headquarters of the Environmental Protection Agency (EPA) to explore the feasibility of an online chemical information system (1). Several capabilities were identified as necessary in such a system, including storage capacity, search and retrieval from libraries of chemical data, computer-assisted analysis of chemical data, and retrieval of information from the chemical literature. The types of numerical data under consideration included mass spectra, carbon-13 nuclear magnetic resonance (NMR) spectra, and x-ray diffraction data. Bibliographic data dealing with mass spectrometry and x-ray crystallography have also been incorporated into the system.

In view of the well-known difficulties implicit in the use of batch-oriented computers in information retrieval, it was considered essential that an on-line interactive computer be used for these tasks and so the NIH PDP-1O was chosen as the computer to be used in these experiments. A chemist who searches through a data base often possesses a good deal of information, perhaps poorly defined, that is related to the problem at hand. A

zzzzzzzzzz

Dr. Heller is a computer specialist in the Management and Information Data Systems Division Environmental Protection Agency, Washington D.C. 20460. Dr. Milne is a research chemist in the National Heart, Lung, and Blood Institute, Bethesda Maryland 20014. Mr. Feldmann is a computer specialist in the Division of Computer Research and Technology, National Institutes of Health, 13ethesda, Maryland 20014.

zzzzzzzzzz

well-programmed computer can interrogate him as he is interrogating the data base and so prompt him to recall as much of this additional information as is feasible to solve the problem at hand. In such an interactive system, the computer can report to the user that "17,450 citations satisfy the criteria" that were specified and ask "Do you want to alter the criteria?" If so, the computer can interrogate the user about the new criteria. With each modification in criteria, the user can see how the number of hits changes; in this way the question can be "tuned" until it gives a manageable number of answers, which can then be either sampled by the user or printed in tote.

To build such a chemical information system one must first locate or generate a data base that can be used in each component. Then programs for interactive searching of the data bases must be written that will permit the direct interrogation of a disk-stored data base so that information may be retrieved at low cost. The structuring of the files on the disk and the exact details of the disk input-output statements are of prime importance in this connection. Next, a means must be devised by which the components can be linked. A user, having identified several compounds from their mass spectra, may next wish to examine their NMR spectra, and some means should be available by which he can readily do so. Finally, a mechanism must be found by which a working program-data base combination can be disseminated to as large a group of users as possible. That the system attract many users is important, because costs to the individual user can be minimized in this way and also because the users are a major source of new data for the system.

The NIH-EPA Chemical Information System (CIS), built on the general principles described above, consists of a series of PDP10-resident numerical and bibliographic data bases together with a battery of interactive, conversational computer programs that can be used to search for and retrieve information from any of the data bases. In addition, there are interactive programs that will permit analysis of data, either to reduce them to a form in which they can be used in the searching programs or as an end in itself. New components or updates are merged into the CIS by the mechanism shown schematically in Fig. 1.

The data bases which comprise the CIS include mass spectra, carbon-13 NMR spectra, x-ray diffraction data for organic and inorganic molecules, and xray powder diffraction patterns. All of these files can be searched structurally; that is, they can be examined for the presence of a given chemical structure or substructure. The files are linked by means of Chemical Abstracts Service (CAS) registry numbers, which are unique chemical identifiers. There are also three bibliographic files which contain the literature citations on mass spectrometry, x-ray diffraction of organic molecules, and gas-phase proton affinities.

The analytical programs that are available can accomplish the iterative analysis of complex NMR spectra, general curve-fitting, and linear regression analysis. Other programs can be used to calculate isotopic enrichment from mass spectral data or, for a given species in solution, to find the molecular conformation with the lowest energy.

Computer Networks

In a computer network (2), the data bases and all the associated programs are stored in a central computer. This computer center has its own dedicated communications system which permits worldwide, 24-hour access by local telephone call by telephone landlines or satellite links, as shown schematically in Fig. 2. A user in Basel can, using a telephone-coupled terminal, obtain access to the programs and data base on the central computer. Communications are essentially instantaneous; the normal pauses associated with time-sharing are generally more noticeable than the signal transmission times (3).

zzzzzzzzz

Fig. I (left). Addition of a component to the Chemical Information System; DCRT, Division of Computer Research and Technology, National Institutes of Health. Fig. 2 (right). Schematic diagram of a computer network.

zzzzzzzzz

The overall cost of communications is divided among all users on a per hour basis, and the usual cost per user is now about $10 to $15 per hour of connect time. Irrespective of one's location, the cost of contacting the CIS, which may be many thousands of miles away, is nearly negligible, and the speed of computer response is essentially instantaneous.

The only other way to avoid long distance telephone calls is to maintain several copies of the data bases in different locations. However, this arrangement incurs higher costs for computer storage and also presents formidable problems in connection with the updating of data bases. When a network is available, the data need be stored on a disk only once and the whole system can be updated by simply replacing the old data base, programs, or both, with new ones. Global networking has the further advantage that all communication with users of a particular program can be handled by means of system messages. It is not even necessary to know who a user is to reach him; his signing onto the system identifies him as a user and prompts the system to print out any messages of importance to such individuals.

A disadvantage of networks is that the techniques they employ run counter to the telecommunications policies of a number of governments. The difficulties range from the relatively simple problem that the use of acoustic couplers is not allowed in some countries to the less manageable case of a government demanding 51 percent local ownership of any network nodes in the country. Another problem is that a network involves considerable overhead costs. To ensure that these costs will be covered, some commitment such as an annual subscription fee must be required. This is reasonable for those who plan moderate or heavy use of the system, but it discourages infrequent usage.

Mass Spectral Search System (MSSS)

This is the oldest component of the CIS and has been developed (4) from programs written in 1971 by Heller (5). The data in this component are unit resolution electron ionization mass spectra collected over about a decade by a number of groups, such as the Mass Spectrometry Data Centre (MSDC) in England and the NIH and EPA in the United States. The complete MSDC-NIH-EPA collection currently contains some 3O,OOO mass spectra of different organic compounds, each of which has a CAS registry number, a Wiswesser line notation (WLN) (6), and a connection table, which is a mathematical representation of the two-dimensional structure of the molecule. This file is available for lease from the National Bureau of Standards (7).

The MSSS is being operated by MSDC via the Cyphernetics Division of ADP Network Services, Inc. (8), and to use it one must pay an annual subscription fee of $300 per institution. The user is also subject to charges for connect time and computation.

The complete mass spectra are stored on the computer (9), but all the actual searching is carried out on an inverted file of abbreviated spectra; the abbreviated spectra (10) consist of the two most intense peaks in every range of 14 mass units in each full spectrum, that is, mass to-charge (m/e) ratio of 6 to 19, 20 to 33, and so on. The file of abbreviated mass spectra is used to generate an inverted file, a list of m/e values versus the identification numbers of spectra in which there is an ion at the particular m/e value. The intensity of the ion is appended to the identification number. When an m/e value is entered as part of a query, the computer reads out from that part of the inverted file the identification numbers of the spectra that contain such an ion, and, while doing this, it checks whether each intensity falls within the range defined by the user as acceptable. This inverted file search (5) is very fast, requiring only about I to 2 seconds of PDP 10 processor time. More importantly, it is largely independent of file size.

In a typical PEAK search through the mass spectral data base (Fig. 3) the user is asked for an m/e value and lower and upper intensity limits on the conventional scale (0 to 100 percent). If only one intensity is provided in the response, the program works with a window of +- 30 percent about this intensity. The first entry in the example shown, m/e 283, intensity between 10 and 40 percent, produces 329 spectra that satisfy this criterion. The second peak entered, m/e 301, produces a certain number of spectra that satisfy that criterion alone but which are not reported to the user; instead, the computer seeks identification numbers that are common to this second list and the original 329 answers. In this case, 40 spectra result. A third peak of m/e 245 reduces this list of hits to one, at which point the user types "one" and the answer, pregn-4-ene-3,20-dione, 17-hydroxy-16,alpha.methyl-, number 23445, is printed out, together with the appropriate CAS registry number (REGN), spectrum quality index (QI), molecular weight (MW), and molecular formula (MF).

Typically, the user next retrieves the complete mass spectrum corresponding to number 23445. One may obtain a partial or complete digital listing of mle values and relative intensities from a computer-generated microfiche copy of the data base or one may use the retrieval option SPEC. If the user has a vector display terminal, he may retrieve the mass spectrum in the conventional bar graph form by using the option PLOT. The mass spectrum can also be plotted and a hard copy of the bar graph may also be obtained either as a photocopy of the screen or by means of an x-y plotter, depending upon the equipment available.

This data base can be searched in other ways, for example, for specific molecular weights or complete or partial molecular formulas. The compilation may also be searched by means of structure codes--arbitrary numerical codes used to define a particular structural feature or compound type.

Each of these secondary search modes becomes very much more powerful when combined (Boolean AND) with the PEAK search. One can, for example, search for all compounds of a given molecular weight that have certain mass spectral peaks. The option PMF permits, as an example, identification of all the compounds containing 19 carbon atoms and one sulfur atom that give a base peak in their mass spectra of m/e 71. There is just one entry in the file, that for the antidepressant drug thiazesim that satisfies these criteria.

The program KB accepts the complete mass spectrum, and, using the technique developed by Hertz et al. (10), finds those spectra that are most similar to that of the unknown. This program, which can be run very inexpensively overnight when computer charges are lower, is suited to the user whose mass spectrometer-computer system is in contact, by telephone, with the MSSS.

Other programs in the MSSS are particularly designed for more effective user interaction with the system. The program CRAB may be used to register complaints and also to report errors that have been found in the data, and changes in the system can be disseminated to users by means of the program NEWS. Each search program has associated with it a HELP program. A user, doubtful of the workings of, for example, PMF, may type HELP PMF, and a short explanation of that program will be printed out at the terminal.

The MSSS operates with a series of fixed computer charges which range from $1 to $6. The most expensive search option is KB, which currently costs $6 ($2 in overnight batch) and which involves a sequential search through all the abbreviated spectra in the file. This option is very extensively used, particularly by those whose laboratory systems are on-line to MSSS by way of direct minicomputer interfaces. Interfacing of laboratory minicomputers to MSSS has proved to be one of the more successful experiments in MSSS, and a number of manufacturers (11) are now marketing mass spectral data-acquisition systems that can be connected by telephone to the MSSS. The problem that arises when the mass spectrum being investigated has been measured upon a mixture of compounds has been addressed by means of the reverse searching program, PBM (12). This procedure is necessary because spectra in the data base are generally measured with pure compounds.

The total number of MSSS transactions per month is currently about 4000. The bulk of the searching is with PEAK and KB. Other searches are used significantly, however, with the possible exception of LOSS, which is a program designed to find all spectra exhibiting a given neutral loss from the molecular ion. Programs that remain idle in this way are ultimately removed from the MSSS. Over 250 laboratories, representing about 150 different organizations, are currently using MSSS, and this number is increasing at the rate of about one per week.

Carbon-13 Nuclear Magnetic Resonance

Search System (CNMR)

This relatively recent addition to the CIS is currently operating in a "pilot" version. The data base consists of entries for 4000 spectra representing the same number of compounds. As a result of very vigorous collaboration between scientists in the United States, Germany, Hungary, Switzerland, the Netherlands, and Japan, this file is now being expanded very rapidly and, it is hoped, will contain some 10,000 entries within a year.

Each entry consists of the compound name, molecular formula, and CAS registry number. The proton-decoupled carbon-13 chemical shifts, with the relative intensities (when available), assignments (that is, the atoms responsible for the respective signals), the multiplicity (S. singlet; D, doublet; T. triplet; 0, quartet) of the single-frequency, off-resonance decoupled signals, and the experimental conditions under which the measurements were made are appended, as are the WLN for the compound and a compound classification code. A connection table and a numbered structure diagram for the compound are also included.

Once assembled, this data base can be searched in some very powerful ways. In a file of CNMR data, one must be able to identify each carbon atom uniquely and so numbered connection tables for every compound must be available. The display of a chemical structure at a command is simple once the connection table is accessible; more importantly, the presence in the file of the connection table makes it possible to search for structure fragments or substructures, a particularly useful capability in a file of NMR data.

Observed chemical shifts can be used (13) to search through the CNMR file to retrieve all the spectra that have the same shift or shifts. A typical search of this sort is shown in Fig. 4. The user is asked to enter a chemical shift in parts per million on the tetramethylsilane scale that is normally used in CNMR spectroscopy and a permissible deviation from this value. The search is then carried out in exactly the same way as in MSSS,

except that no intensity information is used, and the number of hits is reported to the user. At this point, he can terminate the search, inspect the hits that have already been found, or enter another shift, in which case the search is repeated with the second shift, the two lists of hits are intersected, and the number of spectra with both shifts is reported to the user. In addition to the chemical shift, the user may enter the single-frequency, off-resonance decoupled multiplicity of the signal. This information is used as an additional filter on the number of hits.

Once the identification number and the name of an entry of interest have been retrieved from the SHIFT or molecular formula search programs, the entry itself can be printed out or examined on the computer-generated microfiche copy of the data base just as in MSSS. The SPEC option retrieves the name, the structural and molecular formulas of the compound, and the shifts in its CNMR spectrum. Also provided, as available, are the relative intensities of the lines, expressed as a percentage of the most intense line, the single-frequency, off-resonance decoupled multiplicity of each line, and its assignment. An example of the output from the SPEC program is given in Fig. 5. The other options in the CNMR search system are the NEWS, CRAB, and HELP programs, which operate just as in MSSS. The CNMR search system is maintained upon the network by the Netherlands Organization for Chemical Information, which pays the annual disk storage costs for the data base. An annual subscription fee of $100 associated with the use of this system will be instituted in January 1977, and the computer charges for searching are set at $1 or less per search.

X-ray Crystal Structure Retrieval Program

The Cambridge Crystal Structure File is a collection of some 10,000 organic crystal structures that have been reported since 1960 (14). The file is leased from Cambridge University by NIH on behalf of the entire United States and currently contains a bibliographic section in addition to the structural data. To this file the NIH and EPA have added registry numbers, standard nomenclature, and connection tables.

Computer programs have been written by Feldmann et al. (15) to search, display, and manipulate data of the sort that are in this file. These programs use the device-independent Omnigraph display software developed by Sproull (16).

The file may be searched in a variety of ways to find compounds that fulfill specific criteria such as molecular formula, molecular weight, or coordinate data. As with all the other CIS data files, there is in this file a connection table for every compound and so searches may be conducted for a specific structure or substructure as described below. Once an entry has been identified, the molecule can be displayed and the display inspected, rotated, or redisplayed as a stereo pair. Torsion angles, dihedral angles, and interatomic distances can be calculated. This system is available on the network and can be accessed by way of a vector cathode-ray tube terminal or a lower-speed printer terminal, but the latter cannot handle any graphical representation. All the component programs except the substructure search are transaction-priced.

X-ray Crystal Data Retrieval Program

The Crystal Data Determinative Tables produced by the National Bureau of Standards (NBS) contain x-ray diffraction data for some 24,O00 single crystals, both organic and inorganic (17). Each entry consists of the cell parameters (A, B. C, alpha, beta, and gamma), the number (Z) of molecules in the unit cell, the measured and calculated density, the molecular formula, and, depending upon the crystal system, two determinative ratios (for example, A/B and A/C); in the near future each entry will also have a CAS registry number.

The programs for searching through this data base, under development at NIH, permit searches for specific space groups, densities, molecular formulas, or unit cells of given characteristics. Any two of these searches can be intersected in a binary AND.

Completion of the development and testing of the software for this component of the CIS is scheduled for 1977. The data base and programs will then be made generally available through the network.

X-ray Powder Diffraction Retrieval Program

A compilation of the powder diffraction patterns of some 27,000 materials has been assembled by the Joint Committee on Powder Diffraction (18). The data for each entry consist of the relative

intensity, normalized to that of the most intense line, together with the d spacing of the line. The single problem in searching through such a file lies in the fact that, in practice, powder diffraction patterns are very often measured upon mixtures, whereas the patterns in the data base are derived from relatively pure compounds. For a problem of this sort, the reverse search technique (19) upon which the PBM component of the MSSS is based is well suited.

Work is now in progress to adapt a reverse search program to the x-ray powder diffraction file and subsequently to make the resulting system available. It is not expected that this work will be completed before early 1977, and the expected cost to users of this program, which will be managed by the Joint Committee on Powder Diffraction, is at present unknown.

Substructure Searching (SSS)

For the chemist a very desirable method of using the CIS is to search for every occurrence of a complete structural formula or fragment, as opposed to a molecular formula. This procedure is termed substructure searching (20) and involves a search through a file of connection tables for the part that has been specified by the user.

A preliminary step in substructure searching is the preparation of a query, and this may be done interactively as shown in Fig. 6. Hydrogen atoms are implicit, and nodes (atoms) that are not further defined are considered to represent carbon. With these programs, even complicated structures can be developed in an elapsed time of under 5 minutes, corresponding to a few seconds of computer time at most.

Once the appropriate connection table has been generated, a substructure search can be initiated. A typical first step would be a fragment probe search, in this case, for the occurrence of atom-centered fragments identical to node 2, the most characteristic node in the substructure. This results in 53 hits. A subsequent search (RPROBE) for all compounds containing a pyrrolidine ring substituted at the 2-position gives seven hits; intersection of these two files of answers results in only one compound, proline, CAS registry number 147853, that satisfies all the criteria that have been defined. Small files can be examined for exact matches by the program SUBSTRUCTURE SEARCH. This program seeks a precise correspondence between the user-generated substructures in the file by an atom-by-atom search of each structure.

An expensive but necessary step in the preparation of each data file in the CIS is to assign connection tables and CAS registry numbers to each compound in the file. As a result, the substructure search (SSS) programs will operate on any of the CIS files and any other file such as CHEMLINE (21 ) that has connection tables. The computer costs for the use of SSS are variable; the average cost of a substructure search is between $5 and $20.

These costs are uncomfortably high for many users, and, since further software optimization appears to be yielding diminishing returns, the use of structure codes as a presearch device to SSS is being investigated. In this approach, a series of about 400 numerical structure codes (known as CIDS codes) are computer-generated from the data base of the connection tables (22). The first step in SSS will be a presearch based upon the CIDS codes. If the efficiency of the program is substantially improved by this approach, the structure code search will perhaps be merged into SSS as an obligatory presearch procedure.

Mass Spectrometry Bulletin Search (BULL)

The Mass Spectrometry Bulletin, a publication of the United Kingdom Atomic Weapons Research Establishment, consists of about 56,000 abstracts of papers dealing with mass spectrometry published since the mid-1960's. Each citation is reduced to a number of subject key words or codes; in addition, the author's name, the journal reference, the relevant MSDC codes, and elements are retained. The resulting files, which comprise the Mass Spectrometry Bulletin, are used to generate a disk-resident file of citations which can be searched on the basis of the above features. This interactive search system is now part of MSSS, all search options being transaction-priced at $3.

This file can now be searched on the basis of subject, subject code, MSDC code, element, author's name, and the general index of the Bulletin. Boolean AND and NOT operators can be applied to these searches, and one can, for example, locate all the papers dealing with tungsten except those that also treat rhenium.

Because the author, subject, and general indexes of the Bulletin use nonsystematic terms (author's name, for example), misspelling of queries is common. The conversational programs deal with this problem by conducting the search with the first few letters of the input. The different answers that are obtained are listed for the user as shown in the example of the subject search given in Fig. 7. The first query retrieves 1575 references in which data-processing is discussed. The second query limits the answer list to the 55 book references in which data-processing is discussed. Only 17 of these also deal with biochemistry, and only five with isotope analysis. In an author search, as shown in Fig. 8, a request is made for all papers published by Novotny, whose initials are uncertain, and Janak. The two retrieved citations can then be listed as in Fig. 8, at which point the user has the option of limiting the citations listed to those of specific years or of continuing the search with the name of a third author added. It is also possible to search the Bulletin for papers dealing with specific elements or with subjects that appear in the general index.

X-ray Crystal Literature Retrieval Program

A section of the Cambridge Crystal Structure File (14) currently contains some 14,000 literature citations to published crystallographic work. This file has been made a part of the CIS, and search programs have been appended to it by H. J. Bernstein of the Brookhaven National Laboratory. The file may be searched for a given structure or substructure, author, molecular formula, or molecular weight.

Proton Affinity Retrieval Program

The gas-phase proton affinity determines the behavior and utility of a molecule in chemical ionization mass spectrometry. H. M. Rosenstock and his coworkers at NBS have produced a file of some 500 measured proton affinities together with an annotated bibliography of the appropriate literature citations. The proton affinity data are being merged into MSSS, and the bibliographic component of this file will be appended to the BULL component.

Graphical Interactive NMR Analysis Program (GINA)

A problem that frequently arises in NMR spectroscopy is that a spectrum is too complex to yield to first-order analysis. The program GINA (23) is designed to deal with this problem by using estimated values for the various coupling constants and chemical shifts and calculating the expected NMR spectrum. The user can compare this spectrum with the observed spectrum, then alter one or more of the variables, and compare the new calculated spectrum with the observed spectrum. In this way, an iterative approach is made to the true coupling constants and chemical shifts with the user acting as the transducer in the Feedback loop. The program, which can used with graphics terminals or teleypes, is running at NIH and is currently being merged into the CIS.

Mathematical Modeling System (MLAB)

There are many scientists who could use the mathematical power of computers but who are dissuaded by the need for programs. A number of interactive program packages have been designed to overcome this difficulty, and one of the more powerful of these, MLAB, developed at NIH (24), has been incorporated into the CIS.

This program is designed to accept a set of data from the user and to perform, upon command, any of a wide variety of mathematical manipulations upon these data. These include linear, nonlinear, and multiple regression; scalar and matrix computation; differential calculus; initial and boundary value problems; root-finding; and minimization.

Isotopic Label Incorporation Determination (LABDET)

As a result of the difficulties surrounding the use of radioisotopes in medical research, stable isotopes are playing an increasingly larger role in this area. At lower levels of isotopic incorporation, however, there arises in mass spectrometric detection and quantitation the difficulty that naturally occurring carbon consists of a large amount of carbon-12 mixed with a small amount of carbon-13 (about I percent). Stable isotopes of other elements, for example, deuterium, nitrogen-15, and oxygen-18, occur naturally in very small amounts, mixed with their major isotope. In the mass spectrum, fragment ions formed by the loss of a hydrogen atom from the molecular ion alter the observed ion intensities. It is thus not entirely straightforward to derive the incorporation levels from the mass spectral data, and the LABDET program has been written (25) to deal with this problem.

The program accepts the mass spectra of the unlabeled and the labeled compound. From these, an estimate is made of the level of isotope incorporation in the labeled compound. A theoretical spectrum for this level of isotope is then calculated and compared with the experimental spectrum. An iterative process to fit the estimated isotope level to the spectrum of the labeled compound culminates after a specified number of cycles in the calculation of a correlation coefficient. This program, which handles a tedious calculation very rapidly, has been merged into the CIS and is transaction-priced at $2.

ConformationalAnalysis of Molecules in Solution (CAMSEQ)

The conformation of a molecule in solution is related to but not necessarily the same as that in the crystal state. The major purpose of the program package CAMSEQ, written by Weintraub and Hopfinger (26), is to calculate by empirical and quantum mechanical techniques the molecular conformation of a particular molecule that has the lowest free energy in solution.

The program can work with coordinate data supplied by the user, or it can generate coordinate data from a molecular structure. It then systematically alters the torsion angles in the molecule to generate new conformations, for each of which statistical thermodynamic probabilities are calculated, based on the use of potential (steric, electrostatic, and torsional) functions and terms for the free energy associated with hydrogen-bonding, molecule-solvent, and molecule-dipole interactions.

This program package is currently running on the NIH PDP10 where the software is being optimized. In its present form, the core demand of the programs is very high (about 42,000 36-bit words), and, although there are economic questions concerning the feasibility of making the program immediately available through a network, it is hoped that this will be accomplished in 1977.

CIS Management

Each component of the CIS, when it leaves the U.S. government computer and enters the private sector, does so under the auspices of a non-United States government sponsor. This step is taken in conformity with the Office of Management and Budget circular A76 (27) and further ensures that the particular program is then subject to the normal free market forces. If it consistently loses money, that is, if it generates insufficient revenues in subscription fees to defray the costs of disk storage, the sponsor, who pays the disk storage charges, is free to decline further sponsorship and the program, at this point, becomes at least temporarily a dead letter.

A number of details in the CIS structure deserve some further discussion because they represent interesting questions that remain open, and these are treated separately below.

User Manuals

A great deal of effort has been expended to ensure that CIS components are sufficiently alike that a user can switch from one component to another without any extensive reeducation process. Nevertheless, two further steps are taken to assist the user. First, every program in the system has associated with it a HELP file which can be accessed at any time by a user in difficulty. Second, there is written for each component an extensive manual. These manuals, which often exceed 50 pages, are written with some care and cover every aspect of the program they describe. The writing and the printing of these manuals is expensive, and this problem has not been solved yet except by the unsatisfactory approach of selling copies of the manual to users.

Future Directions

As the CIS grows and revenues accruing from the use of the system increase, it seems likely that a dedicated PDP10 could be leased or purchased jointly by the sponsors of CIS components. This would, to a considerable extent, reduce the computer costs discussed above. Increased use of the CIS will assure that the current subscription fees and computation transaction prices will be maximum values and not bases from which prices will increase. Such a prediction is based upon the expectation of a level of use that has not yet been demonstrated with the few components that are currently available to the scientific community.

References and Notes

I. R. J. Feldmann, S. R. Heller K. P. Shapiro, R. S Heller, J. Chem. Doc. li, 41 (1972); S. R. Heller, in Computer Representation and Manipulation of Chemical Information, W. T. Wipke, S. R. Heller, R. J. Feldmann, E. Hyde, Eds. (Wiley, New York, 1974), pp. 175-202.

2. S. R. Kimbleton and G. M. Schneider, Comput. Surv. 7, 129 (1975).

3. The major portion of the distance involved in satellite transmissions is in fact the distance to and from the satellite. Thus, although North America and England are only about 4800 kilometers apart, a telephone signal from one to the other travels some 80,000 kilometers, by way of a synchronous satellite at an altitude of about 40,000 kilometers, and therefore takes 269 milliseconds.

4. S. R. Heller, H. M. Fales G. W. A. Milne, Org. Mass Spectrom. 7, 107 (i973); S. R. Heller, D. A Koniver, H. M. Fales, G. W. A. Milne, Anal. Chem. 46, 947 (1974); S. R. Heller, R. J. Feldmann, H. M. Fales, G. W. A. Milne, J. Chem. Doc. 13, 130 (1973); R. S. Heller, G. W. A. Milne, R. 1. Feldmann, S. R. Heller, J. Chem. Inform. Computer Sci.16, 176 (1976).

5. S. R. Heller, Anal. Chem. 44, 1951 (1972).

6. The WLN are generated from connection tables by means of a program developed by H. Gelernter Department of Computer Science, State University of New York, Stony Brook 11794 [unpublished work].

7. For details, contact Dr. D. L. Lide, Jr., Office of Standard Reference Data, National Bureau of Standards, Washington, D.C. 20234.

8. ADP Network Services, Inc., Cyphernetics Division, Ann Arbor, Mich. 48106.

9. The combined data base of 30,000 mass spectra and 50,000 literature citations, together with all the programs, occupy some 70 million characters (bytes) of disk storage.

10. H. S. Hertz, R. A. Hites, K. Biemann, Anal. Chem. 43, 681 (1971)

11. These include Finnigan, Hewlett-Packard, INCOS, Varian, and V-G Data Systems.

12. G. M. Pesyna, R. Venkataraghavan, H. E. Dayringer, F. W. McLafferty, Anal. Chem. 48, 1362 (1976).

13. B. A. Jezl and D. Dalrymple, ibid. 47, 203

14. O Kennard D. G. Watson, W. G. Town, J. Chem. Doc. , 12, 14 (1972).

15. R. J. Feldmann, S. R. Heller. C. R. T. Bacon, ibid, p. 234.

16. R. F. Sproull, publication CSL 734 available from Xerox, Inc., Palo Alto, Calif., 1973).

17. These data are available through the National Technical Information Service, Springfield, Va. 22151, as NBS tape 9.

18. G. McCarthy and G. G. Johnson, paper C3 presented as part of the Proceedings of the American Crystallographic Association meeting, State College, Pa., 1974.

19. F. P. Abramson, Anal. Chem. 47, 45 (1975).

20. R. J. Feldmann, in Computer Representation and Manipulation of Chemical Information, W. T. Wipke, S. R. Heller, R. J. Feldmann, E. Hyde, Eds. (Wiley, New York, 1974), pp. 55

21.. B. Vasta, J. Chem. Inform. Computer Sci., in press.

22. Handbook of CIDS Chemical Search Keys (Fein-Marquart Associates, Inc., Baltimore,

November 1973).

23. S. R. Heller and A. E. Jacobson, Anal. Chem. 44, 2219 (1972); R. B. Johannesen, J. A. Ferretti, R. K. Harris,J. Magn. Reson. 3, 84 (1970).

24. G. D. Knott and R. 1. Shrager, Assoc. Comput. Mach. SICGRAPH Not. 6, 138 (1972).

25. C. F. Hammer, Department of Chemistry, Georgetown University, unpublished work.

26. H J. R. Weintraub and A. J. Hopfinger, Intl. J. Quantum Chem. 9, 203 (1975).

27. "Circular A76" (Office of Management and Budget, Washington, D.C., August 1967).