The Use of Computers in the Interpretation and Identification of Mass Spectral Data



Stephen R. Heller

US Environmental Protection Agency,

MIDSD

Washington, DC 20460



Abstract



In the past decade the use of computers have become an integral tool to assist the chemist in obtaining, analyzing and identifying mass spectra. New techniques, such as Fourier Transform MS (FT/MS) and MS/MS are feasible only with the use of computers to handle the voluminous data in the time frame necessary to perform the experiments. This paper will present a survey of these activities, with an emphasis on the use of the computer as an aid in structure

elucidation, and the various mass spec databases.



Early in the 1970's, most mass spectrometers were stand alone instruments. In the past decade, the combination of gas chromatography (GC) with mass spectrometry, coupled with automated data collection and analysis equipment, had led to considerable improvements in the field. Today, one can process a complex mixture of hundreds of components quickly and accurately, using a routine GC/MS with a data system. This paper will be divided into two main areas, databases of mass spectra and the computer programs which search these databases or can analyze mass spectra for structure determination.



Databases of mass spectra



Over the years, starting with the American Petroleum Institute Project 44 activities, small collections of mass spectra have been collected by different groups. Not until the British Government, in 1965, initiated funding of the Mass Spectrometry Data Centre (MSDC) and the US Government under the National Institutes of Health (NIH) and Environmental Protection Agency (EPA) did the database collection activities become formalized. In addition, Stenhagen, Abrahamsson, and McLafferty prepared a large collection of mass spectra. Today there are a number of major computer readable collections, that of the US Government, distributed under the auspices of the National Bureau of Standards (NBS) and the McLafferty collection (1).

The former collection has quality i ndex associated with each spectrum, based on an NBS endorsed evaluation criteria (See Figure 1). In addition to these two databases in computer readable form, there are two major printed publications of mass spectral data. One is the complete bar-plots of mass spectra distributed by the US Government Printing Office (2). An example of a page from this publication is shown in Figure 2. The second collection is the 3rd edition of the MSDC Eight Peak Index, which complements the former collection of complete spectral data.(3).In addition to these EI collections, there are a number of other mass spectral databases of CI and FAB spectra, details of which are given in Figure 3.



Library search methods



During the 1970's, along with the development of databases and dedicated computer

systems for mass spectrometers, a number of techniques were developed for searching and identifying chemicals. The first major approach, a batch search on a mini-computer, was developed by Hertz, Hites and Biemann (4). From there an interactive, online system was developed, which has enjoyed wide use as timesharing computers have become more popular.

An example of a simple interactive search based on just a few peaks from the mass spectrum is shown in Figure 4. McLafferty developed two search technique, based on the probability of certain ions (PBM), as well as a technique (STIRS) based on a collection of chemicals fragments associated with certain fragmentation patterns.6 Since the late 1970's most activities in this area have been in refining existing methods. A summary of most of the currently available commercial systems are shown in Figure 5.



Artificial intelligence techniques



Concurrent with the development of data| bases and computer search techniques has been the use of artificial intelligence methods to determine chemical structures from their mass spectra. The need to employ these techniques is obvious when one considers the universe of chemicals consists of perhaps in excess of 10 million chemicals, whereas the largest of the databases of mass spectra contain some 50,000 chemicals.



Amongst the techniques used to aid the chemist have been pattern recognition and | cluster analysis. Pattern recognition techniques assume one knows the chemical classification of the unknown, whereas cluster analysis allows the unknown to be classified in a natural fashion, creating classes as the data are processed.



In a number of cases these techniques appear to work well, but are limited to select classes of compounds, such as steroids and straight chain amines. The problem of course, lies in the fact that unknowns are more often totally unknown (such as in environmental samples), and having a technique which works on a particular class of compounds is of minimal use in these cases.



Summary



The use of computers as a tool in aiding the chemist in the identification of unknown chemicals from their mass spectra has produced impressive and valuable results. With the continued lack of a better understanding of fragmentation patterns, the use of computers in this application will remain limited. Library techniques are limited by the size and nature of the library, relative to the particular problem of the chemist.



With further research into statistical, other multivariate analysis methods, and artificial intelligence methods, such as cluster analysis, it is hoped a better understanding will be developed for mass spectral identification. However, one must begin to seriously consider the possibility that mass spectrometry, in any form of ionization technique (El, CI, FAB, ED and so forth) may have such inherent limitations that additional techniques (such as IR and NMR) will be mandatory in many cases and classes of compounds when determining the structure of unknown chemicals. Certainly, to be sure of any identification, one must use independent confirmatory data, which would include other spectral information (such as IR and NMR), as well as such mundane measurements as GC retention time, melting point or boiling point, and so forth.



References



1. The US Government MS database is available from the NBS, Office of Standard

Reference Data, Building 221, Room A-318,Washington DC, 20234. The McLafferty

database is available from John Wiley & Sons, Electronic Publishing Division, 605

Third Avenue, New York, New York 10158.



2. Heller, S. R. and Milne, G.W.A. The EPA/NIH Mass Spectral Data Base, NSRDS

NBS-63, US Government Printing Office, Washington, DC 20402. The first four

volumes can be ordered as stock number 003003-01987-9, Volume 5, which is Supplement #1, can be ordered as stock number 003 003412268 3. Supplement #2, which will be Volume 6, will be available for purchase by early 1984 from the Government Printing



3. The Eight Peak Index of Mass Spectra, 3rd Edition 1983, is available from The Royal Society of Chemistry, The University, Nottingham, NG7 2RD, England.



4. Hertz, H., Hites, R.A. and Biemann, K., Anal. Chem. 43 (1971) 681.



5. Heller, S. R., Anal Chem. 44 (1972) 1951.



6. Dayringer, H. E. and McLafferty, F. W., Org. Mass Spectrom. 11(1977) 543.