THE NIH--EPA CHEMICAL INFORMATION SYSTEM (CIS)



S. R. Heller* and G. W. A. Milne**



*Environmental Protection Agency, PM-218, Washington, DC 20460, USA

**.NHLBI, NIH, Bethesda, MD 20205, USA



INTRODUCTION



The NIH/EPA Chemical Information System CIS). has been under development since !

1971 as a joint project between the US National Institutes of Health (NIH) and the

US Environmental Protection Agency (EPA), in collaboration with the US National

Bureau of Standards (NBS). the US Food and Drug Administration (FDA), and the US

National Institute for Occupational and Safety Health (NIOSH).



This paper is designed to describe the CIS with emphasis on its potential uses and

highlight its interfacing linking capability that allows for transparent transferring

between CIS components to enable the user to obtain the data of particular interest.

The CIS consists of a collection of chemical data bases together with a battery

of computer programs for interactive searching through these disk-stored data bases.

In addition the CIS has a data referral capability as well as a data analysis software system. It can be thought of then. as having five main areas:



1. Numeric Chemical/Physical Property Data Bases

2. Regulatory and Toxicology Data Bases

3. Structure and Nomenclature Search System

4. Data Base Referral

5. Data Analysis



The numeric chemical physical property data bases that are part of the CIS

include files of mass spectra. carbon-13 nuclear magnetic resonance spectra. X-ray

diffraction data for single crystals and powders. The regulatory and toxicology

data bases include files of acute toxicity data. citations to chemicals in the US

Federal Register, citations to chemicals found in US and other waters. and

toxicology data on commercial products. The structure and nomenclature search

system (SANSS) contains the chemical names (including most all synonyms and trad

names) of all chemicals registered under the Toxic Substances Control Act (TSCA) as

being in commerce in the US and other related legislation in the pesticide and drug

areas. Furthermore the system contains the chemical structure and formula for these

entries and thus the user is able to search by a variety of ways for a chemical of

interest. At present SANSS has about 200.000 chemicals and more than 500.000

associated names. Besides being able to search for these chemical structures (or parts of chemical structures usually called substructures). SANSS has a referral capability What this means is that within SANSS for each of the 200.000 chemicals. there is a referral list stating in which US Government files (and other files. books. and so forth) a chemical can be found. Thus a chemical may be a registered pesticide and also be referenced as being cited in the Merck Index.

The analytical programs include a family of statistical analysis and mathematical modeling algorithms and programs for the second order analysis of nmr spectra and energy minimization of chemical conformations.



The above CIS components are available only in the private sector for use by the government. industry, universities and the public through an annual subscription fee of $300, from which educational institutions are exempt, and an hourly connect charge, as shown below. The CIS uses the Telenet network, and thus is available throughout the US, Canada, Bermuda, most of Western Europe, Japan, Israel, Australia, Hong Kong and Singapore via local phone call. In the US and some other countries, both 300 and 1200 baud service, are available using any standard terminal. The hourly connect rate is either $36 or $60 in the US and Canada. In European countries users are given a $12 per hour credit, and their communication costs are billed directly- to .the user by the local national PTT. In Japan the prices are fixed by the local distributor and are higher. The hourly rates include all the processing and connect time accumulated in a program.



$36/Hour Components: MSSS, OHM-TADS. RTECS, TSCAP&P, CNMR. CRYST, WDROP, NMRLIT and XTAL.



$60,Hour Components: SANSS, PDSM, FRSS, CHEMLAB (formerly CAMSEQ-II),CTCP and MLAB.



The outline shown in Figure 1 can be used to describe the CIS. The CIS is centered about chemical substances, specifically around the CAS Registry number (CAS RN), the unique chemical identifier for any chemical substance. Using the CAS RN, one is able to go back and forth between CIS components to obtain information pertaining to the compound. This unique linking capability is readily seen in Figures 2 and 3, which are similar to Figure 1, except each highlights a data base area of the CIS. In Figure 2, the SANSS component is showing the structure, molecular formula, CAS RN and some names for pyridine, and the Federal Register component is showing one particular citation (out of over 50) to pyridine. To go between the two components, as shown below, requires almost no effort. In Figure 3, the SANSS component is showing the structure of 2-ethylthiopyridine, its CAS RN, molecular formula and some additional names. In addition, as pictorially shown, the CAS RN is used as a "link. to go to the mass spectral data and give a bar plot of the mass spectrum of 2-ethylthiopyridine.



Because of the modular nature of the CIS. it is possible to build individual components elsewhere for later inclusion in the system, so long as the CAS RN is available for linking the system architecture. Also, since the CAS RN is used within the SANSS component, it makes the CIS an excellent jumping off point for searches on a particular chemical or class of chemicals. With the increased information becoming available about chemicals, and an even more increased number of sources locations for these chemicals, keeping track of information can be a complex task. The SANSS, with its pointer referral list capability goes a considerable distance towards solving this problem.



CONCLUSION .



It is hoped that this approach to handling chemical information will prove useful to the chemical community and that further data bases will be added to the CIS along with additional links to the existing commercial bibliographic data bases.



SUGGESTED REFERENCES:



For details on obtaining access to the NIH/EPA CIS. please contact Kay Pool, CIS Project Manager, ISC, 2135 Wisconsin Ave., NW Washington DC 20007 (202-298-6200 or 800-424-2722).

For those interested in the CIS status reports, issued every June and December, please contact either author to be put on the mailing list.



Further details on the SANSS component of CIS can be found in the Journal of Chemical Information and Computer Science. Volume 18, pages 181-186, November 1978.



Further details on the CAS registration can be found in the Proceeding of the 6th CODATA conference, Pergamon Press, pages 137-143, 1979.



Further details on the FRSS component of CIS can be found in Online, Volume 4, pages45-49, April 1980.



Further details on the spectroscopic data bases can be found in American Laboratory, Volume 12, pages 33-48, March 1980.