The ARS Pesticide Properties Database - Using dBase for Scientific Databases

Stephen R. Heller, Douglas W. Bigwood, and Patricia Laster
USDA, Agricultural Research Service, Systems Research Laboratory, Beltsville, MD 20705-2350 USA

Keywords: Numeric Data, Data Quality, Groundwater, Expert Systems, PC Database Management Systems

Abstract: The dBaseIII+/dBaseIV database management software was chosen for the creation, maintenance, and searching of a major pesticide properties database for groundwater quality and other environmental studies. The system design, experiences in the cooperation between Government and industry in the development of this database over the past year, and in the use of dBase for this project are reported. The tradeoffs between the advantages of dBase (distribution) and the technical disadvantages of this software (speed, limited report writing and ad-hoc searching) are described.



INTRODUCTION

Groundwater quality is an issue which is receiving attention because of the potential hazard associated with chemical contamination of our nation's water supplies. The United States depends on farmers and the agricultural industry to maintain current production levels of food and fibre. As a result, farmers make extensive use of fertilizers, pesticides, and other chemicals to control diseases and promote plant and animal growth. The negative aspect of the use of these chemicals is their possible transport into groundwater which supplies about 25% of the nation's fresh water. Accurate information on the possible contamination of groundwater by agro-chemicals is needed for intelligent planning for agriculture activities by management, government, and industry. Such information can only be obtained by creating and testing various hypotheses which attempt to describe chemical mobility. This can be accomplished by designing computer models which integrate and incorporate the latest scientific knowledge allowing simulations of environmental impact by pesticides by different mechanisms. Among these mechanisms are physical, management, crop growth, nutrient, soil chemistry, and pesticide processes.

The pesticide processes include many physicochemical relationships. In order to simulate these processes, complete and accurate data on the properties of each chemical used must be available for input into the model. Without complete and accurate data, the accuracy, and hence value, of the model predictions are severely reduced.

At present there is no definitive database of properties of chemicals used as pesticides. There are a number of compilations of data on pesticides, including handbooks which are either devoted completely or partially to data on pesticides. However, these sources are incomplete and also lack criteria for defining the quality of the data reported. In addition, the data are often reported without mentioning the temperature at which the parameters were obtained, nor the original source. Table 1 shows an example of the variation of solubility data for one particular pesticide (Ref. 1). In searching the literature we have uncovered numerous other examples of such data quality problems (Ref. 2). These databases also have data gaps which limit their usefulness for many modeling activities. This ARS pesticide data research project is designed to produce a pesticide database containing the highest quality data available with all of the pertinent parameters addressing the problem at hand.



APPROACH

A definitive database of pesticide properties is being created with a well defined, step-wise systems approach. The steps are as follows:

1. Establish a Pesticide Properties Database (PPD) National Research Project Coordination Team (NRPCT), in cooperation with the National Agriculture Chemicals Association (NACA).

2. Design the database, search the literature, obtain data from industry and other sources, and enter the data into the database using an IBM PC and dBaseIII/dBaseIV.

3. Evaluate the data (accuracy, quality, and completeness) according to established or newly developed criteria. Data will be evaluated in a defined and (hopefully) objective manner, using a series of computer-based expert systems which either exist or will be developed for that purpose.

4. Fill in the data gaps by calculating, where possible, missing properties using existing or newly developed theoretical techniques which have been validated for the various classes of compounds found in the database.

5. Disseminate the database at low cost through an organization acceptable to all parties participating in the project.

6. Maintain and update the database at the USDA, ARS Systems Research Laboratory.

STATUS

1. Establishment of a PPD National Research Project Coordination Team (NRPCT), in cooperation with the National Agriculture Chemicals Association (NACA).

A team of scientists from ARS, other government agencies, and industry has been established to provide leadership and facilitate the team coordination. NACA has been instrumental in getting this team together. The team members consist of scientists at the Systems Research Laboratory and the Pesticide Degradation Laboratory in Beltsville, MD, the Southeast Watershed Research Laboratory in Tifton, GA, scientists from other government agencies including USDA/SCS, USDA/ES, EPA, and USGS, and scientists from universities and industry (from companies belonging to NACA). A representative from the International Union of Pure and Applied Chemistry (IUPAC) Pesticide Commission will also be asked to serve on the team. IUPAC participation is important as part of the evaluation of the database and the distribution of the database to the scientific community. The team will use the System Research Laboratory's Agricultural Systems Research Resource (ASRR) computer conferencing software for continuing discussions. The conference "PESTICIDE DATABASE" has been established and all interested parties are being invited to join the conference, which can be accessed by direct dial, Telenet, or Arpanet.

The PPD NRPCT will approve the plans submitted to the group by the Team Leader and generally oversee the activities to assure that they are on target, on time, and are supporting ARS objectives and other groundwater needs of the USA. When necessary, the PPD NRPCT will meet, although most of the work is expected to be done via computer conferencing. To date all of the meetings have been at the NACA headquarters in downtown Washington DC, while those unable to attend have been able to keep abreast of the activities using the electronic teleconferencing computer system.



2. Design the database and search the literature, obtain data from other sources, and enter the data into the database.

The first 50 compounds have been chosen and entered into the database from a prioritized list of compounds being studied by the ARS. This list of chemicals is shown in Table 2. Many additional compounds are to be added in the next year as data are received from members of NACA who have agreed to voluntarily contribute data to the project. The data being submitted by industry to this project are the same as those provided to the states of California and Arizona under those states' pesticide registration laws. The initial list of some 40 properties (of which about 12 are of highest priority) which are to be included in the database have been chosen and the information associated with each item has been defined. This will include chemical names, CAS Registry Number (CAS RN), and other identifiers, and variables (temperature, pressure, pH, and so forth), as well as important chemical properties such as solubility, vapor pressure, various partition coefficients, and dissipation rate constants. Details on the data elements, format, and access to the ARS ASRR computer system are available upon request from the senior author (SRH).



dBase Advantages

The dBaseIII/dBaseIV computer database management system, coupled with the Clipper software, which allows for the compilation of dBase code, has been chosen for implementation of data entry, updating, search, and retrieval. The major reasons for the choice of dBaseIII/dBaseIV is that it is widely used and by using Clipper we are able to provide a royalty-free search system for IBM PC compatible computers. This was the critical factor in deciding which software to use. Other software would have required royalty fees and would have involved other non-technical complications which would have made the world-wide dissemination of the PPD more difficult. For example, the encryption aspects of the dBase software would have severely limited the ability to distribute the database to many countries. This same conclusion was recently reached by the International Union of Pure and Applied Chemistry (IUPAC) Committee on Chemical Database (CCDB) and their experiences in using dBase has been satisfactory.



dBase Disadvantages

As with any software package, dBase has a number of limitations when applied to scientific databases and systems. The number of fields available is limited so that a number of files must be set up to handle all the data elements. Having fixed field lengths is justified for administrative systems, but not for scientific data. For example, a good deal of disk space is wasted when the chemical name field length must be set using the longest chemical name available. Entering data using the dBase numeric field is necessary for searching, but creates a problem in printouts. If a parameter field is empty, the dBase numeric default is 0 (zero). Thus it is possible for someone easily to misinterpret a blank as a data value. Conversion of the data to character information for printout is a solution, but clearly involves additional time and storage.

3. Evaluate the data for accuracy, quality, and completeness according to established or newly developed criteria. The first of a series of computer based expert systems has been developed for the evaluation of aqueous solubility data. The expert system, called SOL, is an IBM PC based program, written using the NASA public domain CLIPS software. SOL is currently being tested by a dozen groups in government, academia, and industry. Arrangements are being made to distribute the SOL program through the National Institute of Standards and Technology (NIST, formerly known as the National Bureau of Standards (NBS)), Office of Standard Reference Data.

4. As the database is still being established, it is too early to use or to work on property estimation techniques for supplying values for missing data values. As the current state-of-the-art in property prediction for complex chemicals (such as found in the ARS PPD) is in its infancy, it is probably better to wait for the field to develop further before much of an effort is made in this area to use these programs.



5. The first release of the database in expected in mid-1989. A preliminary database of some 50 chemicals has been distributed to PPD NRPCT and NACA members. As most of the data comes from NACA industrial members, it has been agreed that they will have the opportunity to review their data before it is released.



6. A formal commitment to maintain and update the database was made late in 1988 by senior ARS management. This commitment has helped convince industry to expand their voluntary cooperation and will assure the project of a continuing source of data.



SUMMARY

To date, most of the effort has been expended in searching the literature, obtaining data from industry, and other sources. Cooperation from industry has been essential to the project's progress. The need for data evaluation has been pointed up most strikingly by data on the solubility of many pesticides. In one particular case, it has been found that for almost 30 years the solubility of fenthion has been reported in handbooks as about 55 ppm, whereas the correct solubility (run under proper experimental conditions) is actually 4.2 ppm. This difference of a factor of almost 15 is too great to be ignored.

After a year of work on the first evaluated database of pesticide properties we feel that the goals of the project can be met, but that more time than originally anticipated will be required, as the available data are more difficult to obtain and are less reliable than first expected. An interim database of values obtained will be our first goal, followed closely by proper evaluation of the database, with appropriate data quality indicator ratings.





REFERENCES

1. Heller, S. R., Scott, K., and Bigwood, D. W., "The Need for Data Evaluation of Physical and Chemical Properties of Pesticides", J. Chem. Info. Comput. Sci., submitted.

2. Heller, S. R., Bigwood, D. W., Schantz, M., Guenther, F. and May, W. E., "Expert Systems for Analytical Chemistry Properties - Aqueous Solubility", in preparation.





Table 1 - Chronological Order of Reported Solubility Values in units of mg/L for Fenthion
Std. Temperature Experimental Reference Name
Value Dev. oC Conditions
54-56 Room Temperature No Schrader 54-56 Room Temperature No Gunther
54-56 Room Temperature No Spencer
55 Room Temperature No Verschueren
54-56 Room Temperature No Worthing/6th
56 22 No Khan
54 None No Eto
55 None No Merck Index
7.51 0.31 20 Yes Bowman & Sans
2 20 No Worthing/7th
6.4 10 Yes Bowman & Sans
9.3 20 Yes Bowman & Sans
11.3 30 Yes Bowman & Sans
54-56 Room Temperature No Agrochem Hbk.
4.2 0.19 20 Yes Mobay Report
50 20 No Mackay et. al.
50 20 No Farm Chem Hbk.


Room Temperature was not defined in any reference, but is usually assumed to be from 15-25 oC.

The time frame for the references runs from 1960 to 1988.











Table 2 - Pesticides Currently in the ARS PPD Database
Compound Name CASRN
Alachlor 15972-60-8
Aldicarb 116-06-3
Ametryn 834-12-8
Amitrole 61-82-5
Anilazine 101-05-3
Sodium Asulam 3337-71-1
Atrazine 1912-24-9
Methyl Azinphos 86-50-0
Bromacil 314-40-9
Bromoxynil Butyrate 3861-41-4
Bromoxynil Octanoate 1689-99-2
Butifos 78-48-8
Carbaryl 63-25-2
Carbofuran 1563-66-2
Chlordimeform 6164-98-3
Chlorobenzilate 510-15-6
Chloroxuron 1982-47-4
Cyanazine 21725-46-2
Cyfluthrin 68359-37-5
Cypermethrin 52315-07-8
Cyromazine 66215-27-8
DCPA 1861-32-1
Diazinon 333-41-5
Dipropetryn 4147-51-7
Disulfoton 298-04-4
Endosulfan 115-29-7
Ethion 563-12-2
Fenthion 55-38-9
Fosetyl-al 39148-24-8
Isofenphos 25311-71-1
Metalaxyl 57837-19-1
Methamidophos 10265-92-6
Methidathion 950-37-8
Methiocarb 2032-65-7
Methyl Oxydemetron 301-12-2
Permethrin 52645-53-1
Phenamiphos 22224-92-6
Phosphamidon 13171-21-6
Profenofos 41198-08-7
Prometon 1610-18-0
Prometryn 7287-19-6
Propoxur 114-26-1
Quinomethionate 2439-01-2
Sulprofos 35400-43-2
Terbutryn 886-50-0
Triadimefon 43121-43-3
Trichlorfon 52-68-6
Iprodione 36734-19-7
Phosalone 2310-17-0