A COMPUTER-BASED CHEMICAL INFORMATION SYSTEM



S. R. Heller (EPA)*, G. W. A. Milne (NIH)

and R. J. Feldmann (NIH)



* EPA/MIDSD (PM-218) U.S. Environmental Protection Agency Washington, D.C. 20460 U.S.A.



A series of collections of date stored in a central computer can be searched in real time and at low cost by chemists in North America and Europe.



The NIH-EPA Chemical Information System (CIS) consists of a series of numerical and bibliographic data bases together with a battery of interactive, conversational computer programs which can be used to search for and retrieve information from any of the data bases. In addition, there are, in the CIS, interactive programs that will permit analysis of the data, either to reduce them to a form in which they can be used in the searching programs, or as an end in itself.

All of the CIS components run on a DEC PDP10 timesharing computer, which is the only system whose hard ware is truly oriented to large scale time-sharing. Furthermore, as is evidence by the rapidly increasing number of computer service companies using this machine, the PDP10 is probably the most cost-effective computer of its kind.



The data bases which comprise the CIS include mass spectra, carbon-13 NMR spectra, X-ray diffraction data for organic molecules and inorganic molecules, and X-ray powder diffraction patterns. All of these files can be searched structurally, that is, they can be examined for the presence of a given chemical structure or sub-structure, and the files are linked together by means of Chemical Abstracts service (CAS) Registry Numbers, which are unique chemical identifiers. Whenever a specific com pound appears in any of the files it is accompanied by its own unique Registry Number which can be then be used to find the same compound in other files.



The three bibliographic files deal with literature of mass spectrometry, X-ray diffraction of organic mole cures and gas phase proton affinities.



Finally, the analytical programs that are available can accomplish the iterative analysis of complex NMR spectra, or general curve fitting and linear regression analysis. Other programs can be used to calculate isotopic enrichment from mass spectral data or, for a given molecule, to find the conformation with the lowest energy.



The development work necessary to generate each of these components has been underwritten by the U. S. Government: NIH, EPA, NBS, ERDA, FDA. The first two have played major roles. Once a component has been assembled and tested upon Government computers, it is made available to the private sector where it can be disseminated to the international scientific community on a fee-for-service basis by a computer network service company. In this operational stage, the U.S. Government takes no direct role or responsibility. Instead, each component of CIS is managed by some other organization which is responsible for the annual disk storage charges, which it attempts to recoup through a subscription fee system. The mass spectral search system for example, is managed by the Mass Spectrometry Data Centre, a branch of the Department of Industry of the British Government, and the Carbon-13 NMR Search System by the Netherlands Organization for Chemical Information, a group supported by the Royal Dutch Chemical Society.



Examples of CIS components will be presented and the use and acceptance of the CIS will be discussed in detail.