THE NIH/EPA MASS SPECTRAL DATA BASE AND SEARCH SYSTEM

S. R. Heller, U.S. Environmental Protection Agency, Washington, DC, USA,

W. L. Budde, U.S. Environmental Protection Agency, Cincinnati, OH, USA,

D. P. Martinsen, Fein-Marquart Associates, Inc., Baltimore, MD, USA,

and G. W. A. Milne, National Cancer Institute, Bethesda, MD, USA



ABSTRACT

The NIH/EPA Mass Spectral Search System (MSSS), and its associated databases of EI and CI spectra and proton affinity/gas-phase basicity values, is a

part of the NIH/EPA Chemical Information System (CIS). The CIS is a collection

of spectroscopic, toxicological, regulatory and chemical structure data bases

and associated search software and analysis systems. The MSSS contains a number

of algorithms for assistance in the interpretation of mass spectral data. In

addition to several different library retrieval methods which search a data base

of nearly 39,000 spectra, the MSSS contains some pattern recognition routines to

aid in classifying spectra which are not present in the library.



The NIH/EPA Mass Spectral Search System (MSSS) (ref. 1) is one of the components of the NIH/EPA Chemical Information System (CIS) (ref. l). It consists of a set of programs to search data bases of EI and CI spectra and proton affinity/gas-phase basicity values. In the spring of 1982, the EI data base contained the low resolution spectra for nearly 39,000 different compounds. The data base contains one spectrum per compound and does not include isotonically labelled compounds. These spectra were selected on the basis of their quality index, as determined according to a modification of an algorithm based on the work of McLafferty and co-workers (ref. 2), from an archival file of over 70,000 spectra. These spectra have been contributed from laboratories around the world with individuals contributing anywhere from 1 or 2 to several thousand spectra. Although large in size, the data base is by no means comprehensive. There are many important compounds for which mass spectra are not present. As an example, the EPA inventory of chemicals under the Toxic Substances Control Act (TSCA) contains 54,196 chemicals either produced in or imported into the United States. Of these, only 5,246 are present in the Mass Spectral Data Base. At the present time, the EPA, through its Office of Research and Development, is adding to the data base some of these missing but important compounds. The spectra are being collected under very carefully controlled conditions. Each day before the spectra of any reference compounds are measured, the instrument must be tuned to meet the criteria suggested by Eichelberger, et al (ref. 3) for the compound bis(pentfluorophenyl)phenyl phosphine. In addition, the sample purity was very

carefully controlled. A study of the first 700 spectra from this project I

revealed that while the average quality index for the entire data base is about

500, the average quality index for these 700 spectra is over 700 out of a

possible 1,000 (ref. 4). This indicates a marked improvement over the data base

at large.



In addition to its availability online as part of the MSSS, the data base is also distributed on magnetic tape (ref. 5) and in book form (ref. 6). As of Spring 1982, five volumes of spectra have been published along with an index volume.



The data base of chemical ionization spectra currently contains approximately 1,500 spectra. These have been collected under a variety of conditions, using several different reagent gases. In some cases, more than one spectrum has been included for the same compound. Currently, an attempt is being made to make some modifications to the quality index algorithm for EI spectra (ref. 2) in order to apply it to CI spectra. The major differences in the spectral features are: a) the ions greater than the molecular ion (not allowed in EI, certain ones allowed in CI), b) the number of ions (fewer expected for CI), and c) illogical neutral losses (certain illegal losses in EI may be allowed for CI). In addition, since CI work generally requires a higher electron voltage than EI, the upper limit to the ionizing voltage has been removed.



The data base of proton affinity/gas-phase basicity values has been compiled by the Ion Energetics Data Center at the National Bureau of Standards. It contains 650 measurements for 420 compounds. Included are literature citations to articles describing the measurements as well as a brief summary of the method used to determine the proton affinity or gas-phase basicity value.



The MSSS allows one to search the data bases mentioned above in a number of ways. Due to the capabilities of the MSSS as well as other parts of the CIS, compounds may be retrieved on the basis of molecular weight, molecular formula, partial formula, chemical name, trade name, CAS Registry Numbers and partial or full chemical structures. The mass spectrum may be used as the basis for an interactive peak search in which compounds containing specific peaks with specific intensity ranges may be retrieved. The KB (ref. 7) and PBM (ref. 8) searches retrieve spectra similar to an unknown spectrum based on algorithms comparing abbreviated forms of the spectra. Library spectra may be displayed in a printed tabular form or, if a suitable graphics device is available, in graphics form.



As mentioned above, there are many important compounds which are not represented in the data base. Therefore, a number of pattern recognition approaches have been developed to provide information about a compound even when its spectrum is not in the library. The MOLION program, written by Dromey and coworkers at Stanford University (ref. 9) determines likely molecular weights for a compound based on the mass spectrum. The MSTREE programs, developed by Technology Service Corporation (ref. 10), use a decision tree approach to determine the presence or absence of specific functional groups and assigns a probability to each determination. One of the advantages of this approach is that while the generation of the decision tree may be quite tedious and time consuming, the analysis of a spectrum is very rapid. The analysis consists simply of traversing one path down the decision tree to a terminus. The program I currently checks for chlorine, bromine, nitrogen and phenyl.



In addition to the MSSS, the CIS contains a number of other spectroscopic, toxicological, regulatory and chemical structure data bases as well as associated search systems and analysis software. There is experimental data for C-13 NMR and infrared spectroscopy and x-ray crystallography, as well as single crystal parameters. The Registry of Toxic Effects of Chemical Substances (RTECS) may be searched and displayed. OHM/TADS contains information to aid in the cleaning up of chemical spills. The Federal Register Search System (FRSS) contains all chemicals cited in the Federal Register and is updated weekly. Two new components have recently become available: a data base of nucleotide sequences (NUCSEQ) and the data base of producers and importers of chemicals in the TSCA Inventory (TSCAPP). The Structure and Nomenclature Search System (SANSS) may be used to search by and display names, formulas, and structures of all compounds in the CIS. The CIS currently contains over 225,000 compounds.



REFERENCES



1. The MSSS and CIS are available from CIS, Inc., 7215 York Rd., Baltimore, MD 21212, USA. (301-321-8440).



2. D.D. Speck, R. Venkataraghavan and F.W. McLafferty, Org. Mass Spectrom. 13,209 (1978).



3. J.W. Eichelberger, L.E. Harris and W.L. Budde, Anal. Chem. 47, 995 (1975).



4. G.W.A. Milne, W.L. Budde, S.R. Heller, D.P. Martinsen and R.G. Oldham, Org. Mass Spectrom., accepted for publication.



5. The Mass Spectral Data Base on tape is available from the U.S. National Bureau of Standards, Office of Standard Reference Data, Building 221, Room A318, Washington, DC 20234 (Attention: Dr. L.H. Gevantman, 301-921-3442).



6. The Mass Spectral Data Base books are available from the U.S. Government Printing Office, Washington, DC 20402 in two parts. The first is a set of four volumes. (Order, NSRDS-NBS$63, 003-003-01987-9). The second set contains the fifth volume and a revised index. (Order# NSRDS-NBS#63 Supplement #1, 003-003-02268-3).



7. H.S. Hertz, R.A. Hites and K. Biemann, Anal. Chem. 43,681 (1971).



8. G.M. Pesyna, R. Venkataraghavan, H.E. Dayringer and F.W. McLafferty, Anal. Chem. 48,1362 (1976).



9. R.G. Dromey, B.G. Buchanan, D.H. Smith, J. Lederberg and C. Djerassi, J.

Org. Chem. 40,770 (1975).



10. W.S. Meisel, M. Jolley, S.R. Heller and G.W.A. Milne, Anal. Chim. Acta

112,407 (1979).