THE MASS SPECTRAL SEARCH SYSTEM
Stephen R. Heller,
Environmental Protection Agency (PM~218),
Washington, D. C. 20460 USA
George W. Milne and Richard J. Feldmann
National Institutes of Health,
Bethesda, MD 20014
This presentation offers our experience with the international Mass Spectral Search System (MSSS) which has been operating commercially for almost three years. The MSSS is a unique system in many ways, notable in that it is a cooperative venture between two governments (US and UK). In addition, within the US government, four agencies, EPA, NIH, FDA and NBS, are also working together towards the same goal (Figure 1). This structure is not free of complications, but the problems that have arisen have, for the most part, been solved, and it is felt that international collaboration of this sort is both feasible and worthwhile.
In the period 1971-1972, both EPA and NIH began to develop computer systems for aiding in the identification of compounds from their low resolution mass spectra. Both groups began by using a data base prepared by the Mass Spectrometry Data Centre (MSDC) at Aldermaston, UK. Following some limited attempts by both EPA and NIH to disseminate their systems to the scientific community, where there was considerable interest in the MSSS, a collaborative effort was established between them and the MSDC. Under this arrangement, EPA and NIH, later joined by FDA and NBS, fund continued development of the MSSS, while the UK government takes responsibility for making the MSSS available to the international scientific community. In September 1973, the MSSS was thus made available on the GE Mark III computer network. In July 1975, for technical and economic reasons, the MSSS was transferred to the ADP-Cyphernetics International Computer network.
At present over 125 separate organizations, involving some 200 laboratories in North America and Western Europe, are using the MSSS on a daily basis. From a start of about 10 searches per day in October 1973, the rate of use has grown to over 100 per day some two years later. In addition, about 35 reference spectra from the master file are retrieved (plotted or printed) every day. User interaction with the system developers is moderate, with an average of 3 "CRABS" (comments or complaints written by users onto the system disk file) per week. These CRABS consist of problems, requests for manuals and microfiche of the file and, most importantly, of errors found in the data base.
The US government continues to fund further development of the system, and the operational costs of disk storage and maintenance are covered by user subscription fees, which are 0300 per year per organization (~400 during the first year). Of course, a larger file, coupled with new options, tends to raise the yearly operational costs, but the subscription fee for use of the MSSS has remained the same for over two years, during which time the file has grown from 12 879 spectra to 39 509 spectra. In addition to this annual fee, each MSSS option is priced at a fixed cost (shown in Table 1) and the user must pay for connect time to the computer and the phone call to the nearest node on the commercial computer network. The Cyphernet now has nodes in over 50 cities in North America and Western Europe, and most users can access the network with a local call, thus minimizing his telephone costs. The computer is accessible 24 hours per day, using a variety of terminals such as the standard teletype and teletype-compatible machines, IBM 2741, and graphics terminals, such as the Tektronix 4000 series. The computer can operate at speeds of 10, 15, 30, 120 and 200 characters per second, depending on the user's terminal. Rental for these terminals is, of course, additional to any computer and connect charges.
The original price of a typical PEAK search, the most commonly used option, through a file of 12,879 mass spectra on the GE network was $5 - $20 depending, in part, on the different pricing strcutrues in different countries. With the universal and fixed transaction pricing scheme adopted by ADP-Cyphernetics, the cost of the same PEAK search through a file of 39,509 spectra is 03.00. Thus, in spite of inflation and a tripling in the size of the data base, the cost of a search has been reduced considerably.
The MSSS has now become a stable system, and this success can be traced to a number of factors.
First, no doubt, is the hard work and dedication of the large number of collaborators working on the project.
Second, is the attitude of the system designers towards the users. The user, both the novice and the one with experience, has always been considered a part of the MSSS. Feedback has been encouraged, seminars and demonstrations have been given, and scientists have been encouraged to visit and consult with the various government laboratories where development of the system is carried out. In as many cases as possible, user feedback has been incorporated in the form of new options (e.g. PBM and STIRS), microfiche, new and improved manuals and so on.
Third, the low costs to users of the MSSS have contributed to its acceptance and use. Since two governments were involved in the design of the system, a profit was not required from the MSSS. When it became clear that prices were too high to be commonly accepted, the software was transferred to another network and optimized to provide the service at a price which is considered reasonable and is accepted by the user community.
Last, efforts currently underway and future plans for adding new data bases (13-C NMR, X-ray crystal and powder diffraction) have, hopefully, given users confidence that further computer aids are coming and that by using and supporting the current effort in mass spectrometry, they are helping themselves now and for the future.
The list of current and future options are shown in Table 2. Most of the options have been described in the past and reference is made to those as follows:
Options ............. Reference
1-14, 16-23 .... (1-6)
15 b, c..............(9,10)
required from the MSSS. When it became clear that prices were too high to
Of the more recently added options, the MSDC Bulletin literature search is the first which does not fall into the category of a program to be used as an aid in identification of unknowns. It uses a data base containing 55,000 literature references to various aspects of mass spectrometry that have appeared in the literature since 1966. It is expected that the file will be updated semi-annually or annually, depending on use by and needs of the system users.
The Chemical Abstracts Service (CAS) Registry data is not really an option in itself, but rather follows from the fact that, with the next system update (August 1976) all compound names and molecular formulas will be those used by the CAS Master Registry File and will provide consistent quality control in this area for the first time. In addition, a file of synonyms (names which are not used in the Eighth or Ninth Collective Indices of Chemical Abstracts) will be added to the MSSS. Lastly, the CAS Registry Number (REGN) will be appended to every compound in the file. In this way, about 700 spectra, for which structures could not be drawn because of poor and/or ambiguous nomenclature, have already been eliminated from the data base. The REGN is also a means by which duplicate spectra can be identified and subsequently removed from the file. In fact, about one-third of the spectra have been found to be duplicates, indicating that the storage and search costs have been higher than necessary. The Wiswesser line notation (WLN) for each compound in the file, computer-generated from the CAS Registry III connection tables by software written by Gerlenter and co-workers at SUNY-Stonybrook, under contract to EPA, will also be available.
In addition to making the existing data base available to the scientific community, the file has been registered by CAS and quality checked. The original quality control work was carried out by McLafferty and co-workers, under contract to EPA, and has resulted in the elimination of thousands of duplicate spectra and the correction of thousands of errors. Since the completion of this contract, over 50 000 additional corrections have been made to generate a file of the highest available quality data base. This new data base will first be made available as the MSSS file on the ADP-Cyphernetics network. Later it will be distributed to the public in a manner yet to be determined, but this latter step will take some time since some of the data have to be re-keypunched owing to ownership restrictions on the existing data base. The EPA and NIH are also publishing a book of about 32 000 spectra, with the assistance of the Chemical Abstracts Service (CAS). CAS will computer generate the book, including structural diagrams spectra and indexes (ME, MW, REGN and Names) using their photocomposition system.
In summary, the data base that will be made available during the summer of 1976 is expected to contain about 32,000 high quality spectra with no duplications. In the future, updates are expected to add about 5,000 new and unique spectra to the file each year.
Summary and Future Prospects
With the experience and positive response gained from the MSSS, we are continuing to upgrade the system as well as expand into other data bases, mainly in the area of spectroscopy. The considerable cooperation and free exchange of data with groups throughout the world indicates that future projects of this nature can also succeed, if the proper efforts and support are available.
Access to MSSS
Readers interested in obtaining further details from ADP-Cyphernetics should contact: either 175 Jackson Plaza, Ann Arbor, MI 48106, or J.C. Van Markenlaan 3, Postbus 286, Rijswijk (Z.H.), The Hague, The Netherlands.
In connection with the MSSS, we wish to express particular appreciation to the following: K. Biemann, W.F. Budde, H.M. Fates, W. Greenstreet, D. Henneberg, T.L. Isenhour, D. Koniver, D. Maxwell, J. McGuire, A. McCormick, F.W. McLafferty, J. McSorley, A.W.Pratt, R. Ryhage, M.L. Springer, V. Vinton, S. Woodward, and M. Yaguda.
1. Heller, S. R., McGuire, J. M., Budde, W.L., Trace Organics by GC/MS, Envir. Sci. and Tech., 9, 210-213 (1975).
2. Heller, S. R., Koniver, D. A., Fales, H.M., Milne, G.W.A., Conversational Mass Spectral Search System - Part III, Anal. Chem., 46, 947-950 (1974).
3. Heller, S. R., Feldmann, R. J., Fales, H.M., Milne, G. W. A., A Conversational Mass Spectral Search System - Part IV, J. Chem. Doc., 13, 130-133 (1973).
4. Heller, S. R., Fales, H.M., Milne, G. W. A., A Conversational Mass Spectral Search System -Part II, Org. Mass Spectrom., 7, 107-114 (1973).
5. Heller, S. R., A Conversational Mass Spectral Retrieval System and its use as an Aid in Structure Determination, Anal. Chem., 44, 1951-1961 (1972).
6. Heller, S. R., Feldmann, R. J., Shapiro, K. P. and Heller, R. S., An Application of Interactive Computing: A Chemical Information System, J. Chem. Doc., 12, 41-47 (1972).
7. Feldmann, R.J. and Heller, S.R., An Application of Interactive Graphics - The Nested Retrieval of Chemical Structures, J. Chem. Doc., 12, 48-54 (1972).
8. Hertz, H.S., Hites, R.A. and Biemann, K., Identification of Mass Spectra by Computer -Searching a file of known spectra, Anal. Chem., 43, 681-691 (1971).
9. Kwok, K.S., Venkataraghaven, R., and McLafferty, F.W., Computer-Aided Interpretation of Mass Spectra. III. A Self-Training Interpretative and Retrieval System, J. Amer. Chem. Soc., 95, 4185-4194 (l973).
10. Pesyna, G., Venkataraghaven, R., Dayringer, H.E. and McLafferty, F.W., A ProbabilityBased Matching System Using a Large Collection of Reference Mass Spectra, Org. Mass Spec., in press.
11. Bell, H.M., Computer Analysis of Isotope Clusters in Mass Spectrometry, J. Ghem. Ed., 51, 548 (l974).
12. Dromey, R.G., Buchanan, B.G., Smith, D.H. and Djerassi, C., Applications of Artificial Intelligence for Chemical Interference. XIV. A General Method for Predicting Molecular Ions in Mass Spectra, J. Org. Chem., 40, 770 (1975).
13. Hertz, H.S., Evans, D.A. and Biemann, K., A User-Oriented Computer-Searchable Library of Mass Spectrometric Literature References, Org. Mass Spec., 4, 453-460 (1970).
O. KENNARD: Now that you have the CAS numbers, could you tell me what the overlap is between data bases?
S. HELLER: We have yet to do this work, but plan to have this done within a few months. At that time we plan to publish the results in the Journal of Chemical Information and Computer Sciences.