Conversational Mass Spectral Retrieval System and Its Use as an Aid in Structure Determination, Anal. Chem., 44, 1951-1961(1972)

Conversational Mass Spectral Retrieval System and Its Use as an Aid in Structure Determination

Stephen R. Heller
Heuristics Laboratory,
Division of Computer Research and Technology,
National Institutes of Health,
Public Health Service, Bethesda, Md. 20014

ABSTRACT

An interactive, conversational mass spectral retrieval system consisting of a collection of computer programs designed to give immediate retrieval of mass spectral data is described. The system options include a peak/intensity search, a molecular weight search, a complete molecular formula search, an imbedded molecular formula search, and printout of the peaks and intensities of the entire spectrum. The programs used to generate and search the files, as well as the file structure, are described.

Of all the physical Data available to the organic chemist, mass spectral data are the most readily compatible with the computer. In particular, the so called "low resolution" mass spectra, with peaks at unit mass intervals, are widely available in large numbers and of great value in structural elucidation. Furthermore, when only very small samples of the material are available, such as from a gas chromatograph, the mass spectrum may be the best available information for use in the determination of the structure of the unknown chemical.

A number of investigators (1-7) have developed methods for processing and searching mass spectral data. None of the current methods, however, includes an interactive search capability in which the bench chemist can modify his demands while the search is under way. Our experience in the development of a Chemical Information System (8) has led us to the conclusion that computer programs should be highly interactive, conversational, and available over ordinary telephone lines from the chemist's lab.

The system is designed to be used for both routine identification and for assistance in the determination of unknown structures. The programs described here are used daily and routinely from laboratories at NIH. The programs give virtually instantaneous response at the chemist's teletype. In addition, an extensive user's manual is available which describes how the program is used and gives numerous examples of the different types of searches possible with the system (9).

The philosophy behind the development of the system was to design programs that respond rapidly to the chemist's query directly from his lab and that stimulate his consideration of the next logical response. The data file being used in the search program is the same file as used by Hertz, Hites, and Biemann (7), and was kindly made available by the authors. In this method of spectral data abbreviation, the most intense peaks in each interval of fourteen mass units are selected from the complete spectrum for matching. Other methods include using n peaks per m interval, where n and m are other than 2 and 14, as well as the technique of selecting five to ten of the largest peaks from the spectrum irrespective of mass value. The abbreviated (or compressed) spectrum used here has the virtue of following the chemist's thought pattern analysis, which is to look for patterns and peak clusters. For example, a difference of fourteen emu (CH2) between groupings indicates the presence of straight chain molecules. Indeed, the choice of a search interval of fourteen emu was specifically made so that the computer would give a homologous series of ions the same relative importance as a chemist would give the series of ions Figure I shows the distribution of the abbreviated spectra as a function of m/e value. Normally the low mass region contains many intense peaks and the above method of selection clearly discriminates against ions in this region. The main reasons for compressing or abbreviating spectra have been the storage limitations and the speed of the search system. Third and fourth generation computers with vast on-line storage capacity (being shared by many users in a time-shared computer) as well as increased speed make these reasons less valid. In addition, a well structured file can improve the "apparent" speed of the program. While the system described here does use an abbreviated spectrum, it does not appear to be necessary, and, indeed, does prove costly in the time needed for the selection of the peaks that constitute the abbreviated spectrum, as well as the loss of some information.

Those masses with more than 1000 occurrences in the selected file are shown in Table I.

In addition to a peak and intensity search, it was felt that molecular weight and molecular formula searches would also be of value in both structural elucidation and routine identification problems. An overview of the system is shown in Figure 2.

EXPERIMENTAL

Peak and Intensity Search. The main search program in the system is the peak and intensity search program. The data base being searched uses the two largest peaks in every 14 m/e interval, starting at m/e of 6 (i.e., 6-19, 20-33, etc.). The original file of 8124 spectra contained 762,162 peaks and their intensities. The first computer program pulled or selected the two largest peaks and their intensities in 14 m/e intervals from the original file. The resulting 185,396 peaks (ranging from 9 to 133n pulled represent 24.3% of the file vs. a theoretical reduction of 1/7 or 14.4%. The reason for the file being considerably larger than expected is that in many 14 m/e intervals, particularly at higher mass values, there are considerably less than fourteen peaks and, thus, the two largest must be selected from less than fourteen possibilities.

It might logically be assumed that the proper way for the chemist to enter data from an unknown mass spectrum would be to similarly select, beginning at m/e 6, the two most intense peaks in each fourteen emu. Specifically, selection of more than two peaks within fourteen emu is dangerous, unless the fourteen emu crossover point has been located by ticking off the spectrum beginning at m/e 6. In practice, it is unnecessary to stress this point, because real mass spectra usually have their peaks more uniformly distributed according to mass. An important case where an error might be introduced is in spectra where the fragment ion is so intense as to require inclusion of its '~C satellite in the abbreviated spectrum. In such a case, another important fragment occurring within fourteen emu would be overlooked. After the peaks were sorted by increasing mass (m/e value), the next program took each mass and the list of references (i.e., spectrum ID numbers) and generated a disk file of the m/e values with pointers to a second disk file containing the references to these m/e values.

It will be helpful to describe some details of the file structure for a dear understanding of the search and retrieval method. The first disk file is the pointer file and contains a cell (being one computer word) for each m/e value up to 1337. The second disk file is the reference file, which contains the references or ID numbers associated with a given m/e value. At the beginning of the file generation all the cells in both files contain a --1. As the generation program proceeds, if there are any references at a given m/e, the--1 in the cell or word of the pointer file is replaced by a pointer number. At the same time, the references (in this case, the ID number and the intensity of the peak with the ID number packed together into one 36-bit computer word) are being sequentially put into the disk reference file, one at a time. After all the references for a particular m/e value are put in the file, the next word is left as--I to indicate a breakpoint between reference lists. The pointer number system works quite simply. The first peak found in the file is at m/e 9. There are three references to this value. The next peak is at m/e 10. There are four references to this value. In cell 9 the program stores the value 1, indicating that entries for this m/c value exist. In cell 10 the program stores the value five, which is the previous pointer value + number of references at the previous m/e value + I for the breakpoint cell between references, which is I + 3 + I = 5. In cell 11, the program stores the value 10 (5 + 4 + 1). To find the number of references that contain m/c 10, one simply goes to the 10th cell, and since it is not --1, then proceeds to subtract the value m cell 10 (5) from the value in cell 11 (10) minus one. Thus, the value in cell 10 tells one where in the reference file the ID numbers and intensities are.

In the PDP-10 computer, the basic disk file unit is a block of 128 36-bit computer words. Therefore, to find the location of the reference to m/c 10, one must go to block ((mass-- 1)/128) + 1, which is 1 in the case of m/c = 10. Because a block number must be an integer, the block number is found by a computer technique called integer divide. Integer divide simply truncates any fractional part of a number. Thus 8/3 is 2 and not 3, and 10/128 or 100/128 or 1/128 are all 0. In block 1, one then proceeds to word: module (cell in value--1,128) + 1 = module (4,128) + 1 = 4 + 1 = 5. The module function or operation of module (a,b) is defined as a--(a/b)a, where a/b is an integer division. Also, in the computer, the division is performed before the multiplication, so that when a is a number from 1 to 127, and b is 128, a/b will always be 0. The next cells are read until a cell containing a -- I is encountered, which indicates the end of that list of references. Thus the file structure is as simple to generate and understand as it is rapid in its retrieval of the data. Printouts of the first block of the peak pointer file and the first block of the peak reference file are presented in Tables II and III. A schematic diagram of the peak and intensity search program is shown in Figure 3.

One feature considered a necessity was the ability to filter out masses with intensities very different from the intensity of the unknown mass or to look selectively at certain masses e.g., the base peak (relative intensity 100.0%) at a given m/e value].

Rather than having a fixed intensity factor filter, such as allowing the largest peak in the unknown to be up to 25% larger or smaller than the known spectrum, the program search begins by allowing the chemist to select the value he wants. The values of the intensities of the peaks range from 0 to 100 (i.e., 0.01% to 100.0%). Typically, a factor of 2 - is used. This means that the intensity values entered into the search program will be multiplied and divided by this intensity range factor to give a lower and upper limit for the search. It has been found that for low m/c values, lower intensity factors should be used for two reasons. First, the intensity variations between instruments at a low m/e (~100) are much less than at high m/e values. Second, there are many more peaks at lower m/e values and the filtering will speed up the search and present fewer answers of little value. The intensity range factor can be used in a number of ways. If a factor of 1 is used with a peak intensity of 100, only those spectra with this ion as their base peak will be obtained. On the other hand, if the range factor is 100 or greater, then there will be no intensity filtering because any intensity divided by 100 will include the range of 0 to 100. A range factor of 2 and an intensity of 150 (150~070) will allow only peaks with an intensity greater than 75 (75%) to be considered. Thus, the variable intensity filter allows the chemist to experiment with the system as he searches, and provides a very important psychological factor in obtaining the acceptance of the search system in the laboratory. The range factor value decision is made at the start of the search and is held constant for all peaks entered and considered in that search. It would be a simple matter to modify the factor for each peak, but the additional decision required at each step in the search was considered to be more annoyance than it was worth. Note also that the multiplication/division by an integer implies very wide range factors, i.e., increasing and decreasing intensities by as little as 90% is not permitted (0.9 is a non-integer value). In this system, as contrasted with some other available (6,7), we are looking for an answer or series of suggestions after an input of two to three peaks, rather than the "degree of fit" or "similarity index" of the entire spectrum with one on file. The rationale behind this approach is that quite often the required spectrum will not be in the file, but the spectrum of a similar substance, also exhibiting the peaks in question, may be. If such a near-hit is found, it has been our experience that the chemist is willing to perform the necessary mental gymnastics to extrapolate to a correct conclusion.

After the chemist gives the program the intensity range factor, the program begins to ask for peaks and intensities, one peak and intensity at a time. After the chemist enters each request, the program responds with the total number of peaks in the file at that m/e value, and with the number of spectra that pass the intensity filter. In this manner, the chemist can obtain some idea of the size of partial file resembling his unknown at the same time he is tailoring the search to his own needs. After the chemist has entered the second peak and intensity, and for all peaks and intensities thereafter, the program automatically finds all those spectra that contain that combination of peaks and intensities. If no spectra are found to contain the combination of peaks and intensities, the program responds with this fact and allows the chemist to go back to the previous set of references found, so that he may obtain a listing of those ID numbers and names. Then the program automatically starts at the beginning again, with a request for a new intensity factor. At any point along the way, the chemist has the option of printing the ID number and name of those compounds meeting the current criteria of peaks and intensities. To increase the possibility of reaching a useful result earlier, it is generally best to enter the highest masses first, since the number of peaks at the high masses tend to be smaller. This is satisfactory, provided that the compound is in the file, but if it is not in the file, this search method may fail to locate an interesting lower homolog or related compound with a similar mass spectral fragmentation pattern. In fact, it is often quite useful to add or subtract 14, or even 28, m/e units to the series under consideration. In general, an average search will take from five to fifteen minutes time at a computer terminal, although with some experience with the system, a search can take as little as two minutes.

At the present time, the program accepts up to twenty-five peaks and intensities, although it could be readily expanded to one hundred or more. However, it has been the experience of the chemists using the system that two or three pertinent peaks.will narrow down the possibilities very rapidly. Of course, the judgment of many years of experience goes into the choice of the peaks which the chemist selects for entry into the search program. If one enters a peak and intensity which results in no spectra in the file with that combination of peaks and intensities, it is possible to go back and produce the previous references or to go back one step and continue on.

An example of the search program is shown in Figure 4. In this example, a sample of blood from a drug overdose patient at a local hospital was put through a GC-MS system, and three peaks from the resulting mass spectrum were entered into the search program. Of the possible compounds found from the search, it was immediately clear that thymol (2-isopropyl-5-methyl phenol) was the most reasonable possibility. Thymol is a common flavoring agent and, while not harmful in itself, correctly suggested that an excess of a common cough syrup had been ingested.

Figure 5 is a sample search of a component isolated from a secretion from a rove beetle, Bledius mandibularis. In Figure 5, the search program did not find an exact match, but the resulting list of possibilities included a number of gamma lactones (the C11, C16, and C18 and the lactone with ID number 8060). The similarity of the unknown with the gamma lactone spectra indicated the probability/hat the unknown was a related gamma lactone. Later, comparison with a known sample of the C12 gamma lactone confirmed the structure and identity of the unknown.

In a search (not shown) for a long-chain hydrocarbon, n-dodecane, which should take a long time since most of its peaks are in common with other compounds in the file, it required only three masses (112, 127, and 170) and 1.08 cpu seconds to narrow the search down to three different molecules, one of which was the correct answer.

The test example again illustrates the use of the file even when the compound is not itself in the file. A chemical constituent in a grape-flavored chewing gum was isolated and found to have major peaks at m/e 151, 119, 92, and 65. However, there were no spectra in the file with these combinations of peaks and similar intensities. After some experimenting, the peaks at m/e 151, 92, and 65 were entered, and the resulting answer, shown in Figure 6, was methyl aminobenzoate. Methyl p-aminobenzoate had major peaks at m/e 151, 120, 92, and 65. This answer suggested that it was quite reasonable to consider the unknown to be the ortho isomer, which lost methanol (CH3OH) and that the pare isomer lost only methoxy (CH3O). Thus, the similarity of the spectra could be explained, as well as the apparent anomaly of the m/e 119 and 120 peaks.

The time required for a search is quite short because of the structure of the file as well as the method of determining which references contain the peaks (with the proper intensities) required by the chemist. A simple algorithm is used to intersect the two lists, one containing the previously found references and the other containing the newly found references. Thus, the lists are scanned to find the highest first reference number. Then using this "Hiref" value, the other list is scanned until either the same number or a number higher than "Hiref" is found. If it is the same number, a match is found and that reference number is stored. If there is no match, but rather a reference number greater than "Hiref" is found, this new higher reference becomes "Hiref" and the scanning of the other lists is continued. This alternate scanning down the lists is very rapid and can be extended to intersect many lists simultaneously. An example of a four-list intersection is given in the imbedment molecular search program.

After the final reference list is obtained, the ID numbers and names of the compounds are printed. The structure and search of the files for the name printout are very similar to those of the peak/intensity file. Starting with the ID number located from the above search, the first disk file serves to locate the starting block and word in the second disk file where the name is actually stored. It is stored in parts, with five characters or letters per computer word. Since the PDP-10 is an ASCII machine with a 36-bit word length and uses 7 bits per character, or 35 bits for five characters, by using the difference between the number in the ID cell subtracted from the number in the ID + 1 cell, the length of name is obtained (in PDP-10 word length, which is 1/5 of the actual length of the name). The computer is thus instructed to print out the number of blocks determined above, from the determined starting point, resulting in the printout of the complete name of the compound.

Molecular Weight Search. The molecular weight search program finds all the references with a given molecular weight and lists the ID numbers and names, if desired. There were 426 different molecular weights in the 8124 spectra, ranging from 2 to 1318. The distribution of molecular weights is shown in Figure 7. The file structure and program search are again the same as in the peak search program. Tables IV and V are a printout of the actual disk blocks used in the search for the molecular weight of 109. The l09th cell in block 1 of the MS.MW1 file indicates whether or not there are references to the molecular weight of 109 and the number is found by using the value stored in the 110th cell. The value in the 110th cell minus the value in the l09th cell minus I is 8, the number of references for the molecular weight of 109. From the value in the lO9th cell, which is 1816, it is found that the references to the molecular weight of 109 are found in block ((1816 - 1)/128) + 1 = 15, at word = Modulo (1816 - 1, 128) + 1 = 24. Again, integer divide is used to find these values. In Table V, which is a printout of block 15, the 24th word is 174, the first reference on the list in Figure 8. These ID numbers are then passed into the same name printout program used in the previous peak/intensity search.

Molecular Formula Set The molecular formula search program is really two separate programs, one to search for complete molecular formulas and one to search for partial or imbedded molecular formulas. The imbedded molecular formula search is similar to the Hetero Atom In Context (HAIC) Index that Chemical Abstracts has recently started to issue with every Volume Index.

The complete molecular formula search finds all the references to a given molecular formula. After the number of references found is printed, the chemist has the option of obtaining a printout of the ID numbers and names of the references. Of the 8124 components of the file, there were 2264 different molecular formulas.

The file structure for the complete molecular formula, as well as the imbedded molecular formula, is similar to those previously described, but does have some substantial differences. Dearly a peak value, a molecular weight, and an ID are all numbers and can readily be used to give the location or point to a given disk block and word or cell within that block. However, molecular formulas are essentially characters and, thus, a different technique must be used to put the molecular formula "values" on a disk file for retrieval. The method used is called the hash table method, or hash coding (10).

The hashing results in the assignment of a number to any character string. Although different molecular formulas may produce the same number or numerical key-value, such "collisions" are rare, and can be handled without difficulty. This enables the program to use the numerical key-value the same way the peak value, molecular weight value, and ID number were used previously to address directly a given block and a given cell within that block. Thus, for example, the molecular formula string Ar, when put through the hash function, results in the numeric key-value: 4682565. The module arguments used to find the block and cell values are a bit more complex for the hash Me because of factors such as the size of the hash Me and the size of the cell. In contrast to the peak and molecular weight files, which use only one word per cell, in the hash file each cell is two words, one for the numeric key-value, and the second for the pointer value for the second file. The block containing the numeric key-value is found by first taking modulo(hval, blkval), where hval is the numeric key-value and blkval is the size (in PDP-10 128 word blocks) of the hash file on the disk (in this case 4096 blocks). The result of this is a number, Val, which is then used in the second module calculation of Block = modulo(Val + 1),64 + 1. The value of 64 and not 128 is used because there are two rods per cell, not one, but the size of the PDP-10 block is still 128 words. The record is found in word (mod (Val,64) *2) + 1. This value results in a search of block 6, word 13, which is found to have the value 4682565 stored in it. In the next word of this two-word cell is the pointer value (1) to locate the block (l) and word (1) in the reference file for the list of references corresponding to that molecular formula. The method used here to take account of collisions is to check the numeric key-value at the address cell found with the numeric-key-value of the molecular formula hash. If they are the same, everything is fine, but if they are different, the program simply looks into the following cells for the numeric key-value until either there is a match of the numeric key-values or one encounters a cell containing --1 values. Finding --1 values shows that the molecular formula is not in the file.

Tables VI and VII are printouts of the blocks referred to above. Note that in this case each cell is two words instead of being a one-word cell as in the peak, molecular weight, and ID files. In addition, it is necessary to store the number of references found in the first word in the list in the reference file because there is no meaning to taking a difference between two numeric key-values in the hash Me. Thus, in block 1, word 4 the value of 5 - 1 = 4 is the number of Ar spectra and the next 4 words contain the ID numbers to those references. Again a value of--I indicates that the list of references for that particular formula is finished. After picking up the ID numbers, they are passed to the same name printout program described previously.

The partial or imbedment molecular formula search finds those compounds containing the given combination of atom groups. For example, a search for all compounds containing the two atom groups C6 and H6 would be found in C6H6 as well as C6H6NO2. C6H6)2, etc. A search for all C6 and all H6 atom groups will usually give rise to a greater number of references than will a search for just C6H6. A search for a C and H4 compounds will result in the program finding methane (CH4) and also methanol (CH4O) and methane thiol (CH4S). Examples of the two search programs are shown in Figures 9 and 10. In breaking down the 8124 complete molecular formulas, there were 23,499 partial formulas generated, of which 208 were different. The atom groups occurring more than 400 times (with their frequency of occurrences) are shown in Table VIII

The imbedment molecular formula search program combines the methods used in all the previous programs. The program accepts each of the atom groups (with a current arbitrary limitation of four groups) and then performs a hash table lookup for each, finds the references, and stores them on disk files. Then the previously described list intersection program is used to determine those references common to an the atom groups entered by the chemist. The resulting list is then stored on the disk, for use if the chemist requests a printout of the references meeting his criteria. Again the name printout program is used to print out the ID numbers and their corresponding names.

Spectrum Printout. The last of the options in the mass spectral search system allows a listing of the full or complete spectrum (not just the two most intense peaks in every 14 m/e interval used by the peak search program). The file structure for this program is identical to that of the peak search and molecular weight search.

Using the ID number found previously in the peak/intensity, molecular weight, or molecular formula searches, the program first finds the pointer to the spectrum and the number of peaks in the spectrum. If the chemist wants a printout, he then has the option of looking at all the peaks and their intensities or of looking at only a range of peaks. In addition, there is an option of listing only peaks above a minimum intensity level. An example of the printout found in Figure 11. To conserve space in the computer, a peak and its intensity are stored in one 36-bit computer word, each occupying 18 bits.

Computer Programs, Flies, and Times. All of the file generation and file search programs are written in FORTRAN IV (except for a few assembly language subroutines) and run on a time-sharing Digital Equipment Corp. PDP-10. The main memory of the system has a 1.8-psec cycle time. All of the programs and files are stored on disk packs. The collection of 8124 spectra was generously provided by Professor K. Biemann. The reference file contains numerous duplicates, and no attempt was made to remove any of these. Some effort has been made to correct errors in the data base; however, it is felt that the upgrading of the quality of the file will proceed more rapidly as the system is used by more chemists. There are approximately 2400 spectra from the ASTM Committee ~14 Subcommittee IV spectrum collection (11), 2000 spectra from the Dow Chemical Company (12), 1800 from the American Petroleum Institute (13), and the remaining 1900 from the Mass Spectrometry Data Centre (14) and the laboratory of Professor Biemann at MIT (15). The various file generation programs require 6,000 14,000 words of computer core to run. File generation times vary from about I minute of cpu time for the generation of the two molecular weight files up to about 125 cpu minutes to generate the spectrum printout files. The search system requires 12,000 words of core and most searches use from 2 - seconds of cpu time, including the searching and printout and 5 to 15 minutes of elapsed or human time. In almost all cases, the printout programs require more cpu time than the search programs.

While the search system has been designed to be used interactively, it would be possible to change the programs to run in a batch programming environment, with the ability to enter an entire spectrum (or spectra from a GC/MS run) and have the program automatically select the two most intense peaks from every 14 m/e interval, perform the search, and print out a list of results.

The files used by the search program are all stored on the PDP-10 RPO-2 disk packs and are called into core by individual blocks, not the entire file. This allows the search system programs to remain small. The sizes of disk files are shown in Table IX.

The various intermediary files used by the intersection program require 1-5 blocks. The entire set of files require about 1/4 of a standard PDP-10 disk pack.

RESULTS AND DISCUSSION

It appears that the abbreviated peak file is a good fingerprint for retrieving a compound for identification, and a valuable guide for human interpretation of a mass spectrum. The ability to sit at a terminal and interact with a highly conversational program has been found to stimulate the chemist's interpretation. Clearly, some results found are probably unrelated to the compound in hand, but their rejection is in itself useful, since the chemist is now free to consider other alternatives. The extensive options as to types of searches and options within searches (e.g., the variable intensity factor) along with the instantaneous interactive nature of the system, have been found to make the chemist feel that the system has been written and tailored to his needs.

As the value of the system depends on the size of the data base, plans are under way to expand the file. Of even greater value to the chemist would be the ability to do a partial (or complete) structure search on the file, rather than a partial or complete formula search, which is not as specific. For instance, an ability to find examples of fragmentation patterns for molecules with the nitrogen mustard group, N--C--C--Cl, might be very useful. This technique, known as substructure searching, is being developed in this laboratory both for the Wiswesser Line Notation (WLN) (16) and the Chemical Abstracts Service connection tables (17). In the case of the latter type of data base, a computer search system is under study which will allow for interactive file searching.

While the search system is small and efficient, the files are quite large and require a large on-line disk storage capacity, available at few computer installations. The largest file is the full spectrum file, containing all the peaks and their intensities. One possible alternative to storing such a large file in the computer is to put the full spectrum file in a microfiche retrieval unit in the chemist's laboratory driven remotely by the search system program. In such a device, the ID number would be a pointer to a given microfiche card and page number, in a manner identical to the pointer system used for the peak molecular weight, and spectrum lookup disk files described in the previous section. As the file grows in size, the microfiche becomes economically very attractive compared to the cost of on-line computer disk storage. Also, the microfiche reader can be operated manually and used for other storage purposes.

ACKNOWLEDGMENTS

The author wishes to express his appreciation to Richard J. Feldmann for the extremely efficient intersecting list algorithm. The author also wishes to thank Henry M. Fates, G. W. A. Milne, Robert J. Highet, D. J. Pedder, and 1. W. Wheeler for their generous use and criticism of the search system, and K. Biemann for the data base.

Received for review March 23, 1972. Accepted June 13, 1972. Presented in part at the 163rd National Meeting of the American Chemical Society, Boston, Mass., April 9-14, 1972.

References

(1) B. Pettersson and R. Ryhage, Ark. Kemi, 26, 293 (1967).

(2) S. L. Grotch, Anal. Chem, 42, 1214 (1970).

(3) Ibid., 43, 1362 (1971).

(4) L. E. Wangen, W. S. Woodward, amd T. L. Isenhour, ibid., p 1605.

(5) L. R. Crawford and J. D. Morrison, ibid., 40, 1464 (1968).

(6) B. A. Knock, I. C. Smith, D. E. Wright, and R. G. Ridley, ibid., 42, 1526 (1970).

(7) H. S. Hertz, R. A. Hites, and K. Biemann, ibid., 43, 681 (1970).

(8) R. J. Feldmann, S. R. Heller, K. P. Shpairo, and R. S. Heller, J. Chem. Doc., 12, 41 (1972).

(9) S. R. Heller, DCRT/CIS, "Mass Spectral Search System User's manual," Division of Computer Research and Technology, Bethesda, Md, March 1972.

(10) R. Morris CACM, 11, 38 (1068).

(11) Uncertified Mass Spectra, Subcommittee IV, ASTM Committee E-14 (1960).

(12) R. S. Gohlke, Ed. Uncertified Mass Spectral Data, Dow Chemical Company, Midland, Mich., 1963. Distributed through the ASTM Committee E-14.

(13) Catalog of Selected Mass Spectral Data, American Petroleum Institute Research Project 44.

(14) Mass Spectrometry Data Centre, AWRE, Aldermaston, Berks, England.

(15) Professor K. Biemann, MIT, private communication (1971).

(16) R. J. Feldmann and D. A. Koniver, J. Chem. Doc., 11, 151 (1971).

(17) R. J. Feldmann and S. R. Heller, ivid., 12, 48 (1972).