G. W. A. Milne
Division of Cancer Treatment, National Cancer Institute, National Institute of Health, Bethesda, Maryland 20205, USA

W. L. Budde
Environmental Protection Agency, Cincinnati, Ohio 45219, USA

S. R. Heller
Environmental Protection Agency, Washington, DC 20460. USA

D. P. Martinsen
Fein-Marquart Associates, Baltimore, Maryland 21212, USA

R. G. Oldham
Radian Corporation, Austin, Texas 78766, USA


Over 1400 electron ionization mass spectra of selected organic compounds have been measured under carefully defined conditions. In this paper, the variables such as sample purity and spectrometer calibration that are controlled are described. The quality of the resulting spectra as well as the cost of the measurements is reported.


During the last ten years, four agencies of the US Government1 have cooperated in the building of what has become known as the NIH-EPA Mass Spectral Data Base.2 This file currently contains 38 805 spectra, every chemical being represented only once. Therefore it is much more developed than its earliest predecessor which, in 1970 had only 8000 spectra representing some 5000 compounds.3 The data base is disseminated on magnetic tape,4 in printed form5 and via the NIH-EPA Chemical Information System, an interactive search system which is available worldwide on computer networks.6 In these various forms the data base is used heavily in areas such as biomedical research7 or environmental monitoring.8

A major concern to the management of this file is over the quality of the spectra it contains. Methods have been devised to control the quality of the chemical nomenclature and structure associated with the spectra. The quality of each spectrum is estimated algorithmically and, in a recent experiment, over 1400 spectra have been measured under carefully controlled conditions. The purpose of this paper is to describe this work and to report the quality of the spectra that were measured and the cost of the measurements.


The only chemical data elements in the Mass Spectral Data Base are those that are concerned with the identity of each compound. In the original data base these were comprised of a name and synonyms, molecular weight (MW) and molecular formula. A problem that arose in the early 1970s was that duplicate spectra of the same compound were being added to the file as spectra were acquired. Further, it was not possible to detect this in all cases because the name used for the compound may have been different from that used with the earlier data base entry. This led to results such as that show in Fig. 1, in which a search for all compounds with a molecular formula of C6H6, gave a number of spectra of benzene, as well as spectra of some of its acyclic isomers.

In 1974, it was decided to remove redundant entries from the file as they served little purpose and were driving up the costs of computer storage for the search system. The name(s) of every compound were sent to the Chemical Abstracts Service (CAS) where, under contract to EPA, CAS identified the CAS Registry number (RN) for the chemical by a search of the registry nomenclature file.9 Where a name search failed also, a new Registry number was applied to the compound. A few compounds could not be satisfactorily identified and those spectra were expunged from the file, but the majority were assigned the correct CAS Registry number in this way.

Upon completion of the registration step, the entire data base was sorted by Registry number. This brought together all spectra for each compound because, although the compound may have been identified in each spectrum with a different name, it had the same Registry number in every case. An additional benefit which was realized in this process was the identification of each compound in terms of its CAS Registry number, and in turn, its standard chemical names as well as all known synonyms. These are al collected together in the CAS Registry and include nonstandard chemical names as well as trivial names and trade names. The Registry number is also used to retrieve from the CAS data the structure records for the chemicals. These records are the basis of the structure and nomenclature searching capability of the CIS.10 The next logical step in this overhaul of the data base was to discard all but one mass spectrum of each chemical. To do this, it was necessary to decide which of several spectra of a compound was the best one. An approach to this question is described in the next section.


As the mass spectrometry of organic compounds developed during the 1960s, spectroscopists became familiar with the types of errors that occur frequently in recorded spectra.11 Responses ranging from modification of experimental procedures to redesign of spectrometers were adopted to eliminate many of these errors. The result is that a conscientious analyst using a modern spectrometer can produce mass spectra which rarely, if ever, contain such errors. In 1978, McLafferty's group published an algorithm12 which examines a mass spectrum for the occurrence of such standard errors. The program computes a number, referred to here as the Quality Index (QI), which was proposed as an indicator of the quality-in terms of the absence of standard errors-of the spectrum.

The McLafferty algorithm employed seven Quality Factors (QF), each having a value between zero and one. Multiplication together of all these Quality Factors and multiplication of the product by 1000 leads to the Quality Index for the spectrum. The Quality Factors were the source of the spectrum, the ionization conditions, higher molecular weight impurities, illogical neutral losses, isotopic abundance accuracy, the number of peaks, and the lower mass limit of the reported spectrum. A number of modifications were made to this algorithm for application to the Mass Spectral Data Base. Since most of the spectra in question are no longer derived from published sources, the relevance of McLafferty's first Quality Factor is reduced and this Factor was dropped. The remaining six Quality Factors were left essentially unchanged, but were augmented by the addition of three new Factors. These were the sample purity, the length of time since the last spectrometer calibration, and the similarity between the most recently measured calibration mass spectrum and the data base copy of the spectrum.

The resulting nine Quality Factors are listed in Table 1 and each of them was determined by application of formulas described below. No Quality Factor may be less than zero or greater than one, and rounding is used to keep the number within this range.

Table 1. Quality factors used

Quality Factor - Feature Tested

QF1 Electron voltage
QF2 Peaks above molecular weight
QF3 Illogical neutral losses
QF4 Isotopic abundances
QF5 Number of peaks
QF6 Lower mass limit
QF7 Sample purity
QF8 Calibration date
QF9 Similarity Index of calibration mass spectrum

QF1-Electron voltage

The first Quality Factor, for ionization conditions, is defined by the equation

QF1 = [SQRT(V - 10)]/40

where V is the reported electron beam potential, typically 70 eV.

QF2-Peaks above the molecular ion

The sum of the intensities of all peaks with mass greater than (MW+3+N) is defined as I. The quantity N is defined as


in which Br, C1, etc. The quantity N is used to allow normal isotopic contributions from these elements. QF2 is related to both I and M, the molecular ion intensity, by the equation


QF3-Illogical neutral losses

The question is to which fragment ions can be assumed to correspond to 'illogical' neutral losses depends upon elemental composition and a table of such disallowed losses has been proposed.12 In the absence of elements such as Si, C1 and Br, losses of 2, 3, 4, 5, 7, 8, 10, 11, 12, 13, and 14 amu are not expected and their appearance detracts, in proportion to their intensity, from QF3 according to the equation


where M is the molecular ion intensity, L is the intensity of the fragment ion and T is the maximum tolerated intensity for that neutral loss, as given in McLafferty's table.12

QF4-Isotopic abundances

The intensities of molecular ions containing minor isotopes may be calculated from the molecular formula of the compound. If the observed intensities differ from the expected values, the deviation contributes to QF4 according to

QF4 = (M + 5 - D) / (M + 5)

Where D is the larger deviation in excess of a 10% relative tolerance of either the [M+1]+ or the [M+2]+ ion intensities.

QF5-Number of peaks

Based upon data base statistics, McLafferty suggested that the number (N) of peaks in a mass spectrum should be related to the total number (Z) of atoms in the molecule. Therefore QF5 is defined as

QF5 = N/ (Z + 15)

QF6-Lower mass limit

In the literature, the lower mass regions of mass spectra are frequently not reported. A Quality Factor QF6 was established to check for this point, although when running spectra, the conditions need never arise and so in general, QF6 = 1. The equation for this Quality Factor is

QF6 = (M - M') / (M - 29)

Where M is the molecular weight and M' is the lowest reported mass.

QF7-Sample purity

The first of the new Quality Factors, QF7, is determined by the purity of the sample. QF7 is given by

QF7 = P/100

where P is the percentage purity of the material. An important rider to this rule is that if P <90, QF7 is arbitrarily set to zero, thus ensuring that the overall QI will be zero. It might be noted that this makes sample purity extremely important; a mass spectrum from a material which is less than 90% pure will inevitably have a QI of zero and will be rejected in favor of a poorly measured or reported spectrum of a purer sample.

QF8-Calibration date

The time elapsed since the last calibration is the basis of QF8, whose value is one until this elapsed time exceeds one week. Thereafter, QF8 is given a value of 0.9.

QF9-Calibration date

At the time of calibration, the calibration spectrum is stored and the similarity between it and the standard library spectrum13 of the compound [bis-(pentafluorophenyl) phenyl phosphine, RN 5074-71-51] is computed by the Similarity Index program within the Mass Spectral Search System.14 This number, which lies between zero and one, becomes QF9, which is an indicator of spectrometer performance.

All the Quality Factors were calculated automatically by means of a computer program which also computed the Quality Index for each spectrum. Then, whenever spectra associated with the same CAS Registry number were encountered, the one with the highest QI was retained, the remainder being left in an archival file. When this process was completed with the earlier data base, about 22% of the entire data base was consigned to an archive file.15 For the spectra in the current working Mass Spectral Data Base, the average QI is 500.68. With spectra such as these which had already been measured, both QF8 and QF9, which relate to calibration of the spectrometer were arbitrarily assigned values of 0.90.

As new spectra are assimilated into the data base, each is assigned the appropriate Registry number and the data base is scanned for the existence of this number. The QI is then computed and, if the compound is new to the file, the entry is simply added to the database. If the Registry number is already in the file then the spectrum with the higher QI is retained and the other spectrum is achieved. This process has led to a great improvement in the search system, as may be seen from Fig. 2. Here the search that was show in Fig. 1 , for isomer of C6H6, is repeated. In the output in Fig. 2 every distinct compound is cited just once, by name16 and Registry number, and the QI of each spectrum is also provided.


The Toxic Substances Control Act, which became law in 197617 required amongst other things that EPA compile a list, or "inventory," of all chemicals in commerce in the USA. In 1978, when the initial Inventory was compiled, it was determined that of the 43278 chemical substances it contained, only 3862, or 8.92%, were represented in the Mass Spectral Data Base, which then contained 32187 spectra. In the two years since then, this proportion has actually decreased to 8.55%. This is because compounds have been added to the Inventory since then but their mass spectra were unavailable and thus could not be added to the Mass Spectral Data Base. In this period the Mass Spectral file did grow from 32187 to 38805 entries, but most of the new spectra were not of Inventory chemicals.

Since the Mass Spectral file serves as a major resource in pollutant identification, and since most industrial pollutants are manufactured in or imported into the USA and are therefore in the TSCA Inventory, this situation clearly demanded correction. Accordingly, a list of Inventory chemicals absent from the Mass Spectral file was prepared and ordered by decreasing amount of annual commercial volume. This priority-ordered list was then provided to a laboratory which had successfully competed for a contract with EPA to do this work.18 The contractor was instructed to procure samples of chemicals on the list, assay their purity, purify them if necessary, and finally measure their mass spectra. Specifications for all these steps were carefully defined so that high quality mass spectra would emerge from the effort and their QI values should reflect this.

In order to comply with these requirements, a procedure was developed for the acquisition of nes mass spectra, and between November 1980, when the work began, and the end of 1981, over 1400 spectra had been measured using this procedure. Upon acquisition, each chemical is checked for purity using chromatographic techniques. If the sample is found to be 99% pure or better, and if it exhibits adequate vapor pressure, the measurement of the mass spectrum is usually achieved by introduction of the compound into the ion source via a molecular leak inlet. Samples whose purity is determined to be between 90% and 99% are introduced into the mass spectrometer via the gas chromatographic inlet. In such cases, the actual measured purity is used in computing QF7, i.e. a worst case is assumed. For samples whose purity is below 90%, Two courses of action are available. If sufficient resolution can be achieved, the sample can be admitted into the mass spectrometer through the gas chromatograph. This is permissible because the purity of the compound in the ion source is more important than the ultimate assay of the actual sample. Alternatively, if the impurities cannot be resolved satisfactorily from the compound of interest by gas chromatography, the material is subjected to purification efforts which are centered upon recrystallization and fractional distillation.

A few materials are not amenable to gas chromatography and with these compounds the sample was introduced into the mass spectrometer via a direct probe, and estimate of the sample purity being made by external means. When the mass spectrum was measured, it was written on magnetic tape. Information concerning the spectrometer calibration and the compound identity was added and the tape was then transferred to a second contractor19 for data analysis as is described in the next Section. This process precludes any manual typing of data and has contributed to a very substantial decrease in the data errors that appeared in the file when the collection of mass spectral data frequently involved manual keying of data as they were collected from a written or printed source.


Prior to addition to the data base of any new mass spectra from the work described here, the average QI of all spectra in the files was 500.68 and the standard deviation from this mean was 220.41. If all spectra with QI = 0 were omitted from this calculation, the mean QI was 521.83 (SD 198.98).

For purpose of comparison, a batch of 688 spectra from the current work were selected randomly and for this group of spectra, the mean QI was 700.69 (SD 233.90). Removal from this group of the 25 spectra with QI = 0 left 663 spectra with a mean QI of 727.11 (SD 193.73).

Spectra delivered by the contractor with a QI of zero are returned for remeasurement and so it is legitimate to compare the earlier mean QI of 521.83 (SD 198.98) with the figure for the new spectra of 727.11 (SD 193.73). Such a comparison indicates that a 39.34% increase in mean QI has been achieved, while the standard deviation has dropped slightly, by 2.64%.

Of the 25 new spectra with QI = 0, 14 had QF7 = 0, i.e. samples with a purity of less than 90% were used. For eight of the 25, QF3 (illogical neutral losses) was zero; four had QF2 = 0 because they contained peaks for which m/z > MW, and two spectra failed the isotopic abundance test (QF4). Three spectra failed more than one of these criteria simultaneously. From this, it may be concluded that the sample purity test which underlies QF7, is most critical. Three other Quality Factors, QF2, QF3 and QF4, are also of importance. Of the remaining Quality Factors, those concerning electron voltage (QF1) and spectrometer calibration date (QF8) had no effect upon the QI values in this group of spectra, because the contractor was performing as instructed. The remaining three Quality Factors, dealing with the number of peaks in the spectrum (QF5), the lowest m/z reported (QF6), and the quality of the calibration spectrum (QF9), rarely had a major effect upon the overall QI.

A summary of the means and standard deviations for all the Quality Factors in the same 688 spectra is presented in Table 2.20 From this table, it can be seen that QF5, which is related to the number of peaks in the spectrum, not only has the lowest mean, but the highest standard deviation. This suggests that this Quality Factor, while not without value, may be dependent upon factors other than spectral quality. The mean value of QF9, reflecting the similarity of the calibration spectrum with its library spectrum, is surprisingly low, at 0.900359, but its standard deviation is also low, suggesting that a systematic error, such as a poor library spectrum of the compound, may underlie the low value of QF9. The mean for QF7 is low and its standard deviation is high. This reinforces the conclusion that sample purity is important and also points up the variability in sample purity experienced in this work. The remaining Quality Factors are all approaching one with standard deviations approaching zero. This is a reflection of the increasing degree of control held by the spectroscopist over the related variables.

The four most important Quality Factors are QF5, QF9, QF7 and QF3. With the exception of QF9, these all relate in one way or another to sample purity. This permits the unsurprising conclusion that mechanical parameters, such as spectrometer tuning and calibration, can be adequately controlled, but chemical parameters such as sample purity present a more serious obstacle in the measurement of high quality reference mass spectra.

Table 2. Values of quality factors

Quality Factor Mean Value Standard Deviation Feature tested
QF5 0.871071 0.205208 Number of peaks
QF9 0.900359 0.037644 SI of Calibration MS
QF7 0.964916 0.140553 Sample Purity
QF3 0.974518 0.127268 Illogical neutral losses
QF4 0.974783 0.085014 Isotopic abundances
QF6 0.982433 0.037247 Lower mass limit
QF2 0.990063 0.080270 Peaks above MW
QF8 0.998406 0.012552 Calibration Date
QF1 1.000000 0.000000 Electron voltage


The materials used in this work were purchased from chemical supply houses. In general, the minimum quantity available was purchased, but in many cases, this was 1 kg. The average cost of acquisition for each of 2770 samples was $21.

Some samples required only assay, having been supplied sufficiently pure for analysis. In other cases, recrystallization or redistillation was required. Over the batch of 2770 samples, an average cost of purification of about $40 was registered. This includes labor costs and the cost of expendable materials, such as solvents. The amount of material purchased was always well in excess of that required for mass spectral measurement. Consequently, the program managers have taken the opportunity to use the excess chemicals to begin the building of a repository of commercial chemicals. This will be used for the measurement of other reference values such as infrared spectra.

For an initial batch of 1398 mass spectra, measured with a Hewlett-Packard 5985 gas chromatograph-quadrupole mass spectrometer, the average cost for a final spectrum and total ion current record was $52.

The average total direct cost then of obtaining a mass spectrum on a sample of known, high purity, is ($52 + $40 + $21) or $113. This must be added the overhead cost of the laboratory, which in this case were approximately $130. The overhead covers the cost of amortization of the equipment, the salaries of supervisory staff and similar indirect cost.

The total per spectrum cost then, is $243 and this figure may be compared with commercial service laboratory charges such as $100 for a direct probe mass spectrum and $275 plus $15 per component for a gas chromatograph mass spectrum,21 rates which appear to be quite competitive but which do not of course include the $21 average cost of sample acquisition.


1. The agencies involved are the Environmental Protection Agency (APA), the National Institutes of Health (NIH), the National Bureau of Standards (NBS), and the Food and Drug Administration (FDA).

2. S. R. Heller, R. J. Feldmann, H. M. Fales and G. W. A. Milne, A Conversational Mass Spectral Search System. IV. The Evolution of a System for the Retrieval of Mass Spectral Information J. Chem. Doc. 13, 130 (1973).

3. S. R. Heller, H. M. Fales and G. W. A. Milne, A Conversational Mass Spectral Search and Retrieval System. II. Combined Search Options Org. Mass Spectrum. 7, 107 (1973).

4. EPA-NIH Mass Spectral Database Magnetic Tape. Office of Standard Reference Data, NBS, Washington, DC 20234.

5. S. R. Heller and G. W. A. Milne, The NIH-EPA Mass Spectral Handbook. NBS-NSRDS 63 (4634 pp). Government Printing Office. Ordering Number, 003-003-01987-9; Supplement #1, (1292 pp.). Government Printing Office. Ordering Number, 003-003-02268-3.

6. G. W. A. Milne and S. R. Heller, The NIH-EPA Chemical Information System. J. Chem. Inf. Comput. Sci. 20, 204 (1980).

7. See, for example, V. P. Williams, D. K. Ching and S. D. Cederbaum, Clin. Chem. 25, 1814 (1979).

8. See, for example: W. D. Bowers, M. L. Parsons, R. E. Clement, G. A. Eiceman and F. W. Karasek, J. Chromatogr. 206, 279 (1980).

9. S. R. Heller, G. W. A. Milne and R. J. Feldmann, "Quality Control of Chemical Data Bases J. Chem. Inf. Comput. Sci. 16, 232 (1976).

10. G. W. A. Milne, S. R. Heller, A. E. Fein, E. Fein, E. F. Frees, R. G. Marquart, J. A. McGill, J. A. Miller and D. S. Spiers, The NIH/EPA Structure and Nomenclature Search System (SANSS)> J. Chem. Inf. Comput. Sci. 18, 181 (1978).

11. See, for example: K. Biemann, Mass Spectrometry, Organic Chemical Applications, p. 162. McGraw-Hill, New York (1962); H. Budzikiewicz, C. Djerassi and D. H. Williams, Mass Spectrometry of Organic Compounds, p.8. Holden-Day, San Francisco (1967).

12. D. D. Speck, R. Venkataraghavan and F. W. McLafferty, Org. Mass Spectrum. 13, 209 (1978).

13. J. W. Eichelberger, L. E. Harris and W. L. Budde, Anal. Chem. 47, 995 (1975).

14. S. R. Heller , D. A. Koniver, H. M. Fales and G. W. A. Milne, A Conversational Mass Spectral Search System. III. Display and Plotting of Spectra and Dissimilarity Comparison Anal. Chem. 46, 947 (1974). The SI algorithm described in this paper has since been replaced for the sake of internal consistency with the algorithm written by Hertz et al., (Anal. Chem. 43, 681 (1971) which is in routine use elsewhere in the MSSS.

15. The spectra which were removed at this point also included some 5000 mass spectra which were under a private copyright.

16. Output in MSSS includes the CAS Collective Index (8C1 or 9C1) name(s) for the compound, followed by the first three or four names from the alphabetized list of synonyms, for a total of five names or as many as are available. This arbitrary number of names output can be changed by the user.

17. Toxic Substances Control Act of 1976, Public Law 94-469.

18. EPA Environmental Monitoring Support Laboratory, Cincinnati, Ohio 45219, USA Contract #68-03-2879 with the Radian Corporation, Austin, Texas, USA.

19. EPA Management Information and Data Systems Division, Washington, DC, 20460, USA. Contract #68-01-4831 with Fein-Marquart Associates, Baltimore, Maryland, USA.

20. Spectra with QI = 0 were not excluded from these calculations.

21. Shrader Laboratories Inc., 3450 Lovett, Detroit, Michigan 48210, USA.

Received 26 February 1982; accepted 25 May 1982