The ARS Pesticide Properties Database - Using dBase for Scientific
Databases
Stephen R. Heller, Douglas W. Bigwood, and Patricia Laster
USDA, Agricultural Research Service, Systems Research Laboratory,
Beltsville, MD 20705-2350 USA
Keywords: Numeric Data, Data Quality, Groundwater, Expert Systems, PC
Database Management Systems
Abstract:  The dBaseIII+/dBaseIV database management software was chosen
for the creation, maintenance, and searching of a major pesticide
properties database for groundwater quality and other environmental
studies.  The system design, experiences in the cooperation between
Government and industry in the development of this database over the past
year, and in the use of dBase for this project are reported.   The
tradeoffs between the advantages of dBase (distribution) and the technical
disadvantages of this software (speed, limited report writing and ad-hoc
searching) are described.  
INTRODUCTION 
      Groundwater quality is an issue which is receiving attention because
of the potential hazard associated with chemical contamination of our
nation's water supplies.   The United States depends on farmers and the
agricultural industry to maintain current production levels of food and
fibre.  As a result, farmers make extensive use of fertilizers, pesticides,
and other chemicals to control diseases and promote plant and animal
growth.  The negative aspect of the use of these chemicals is their
possible transport into groundwater which supplies about 25% of the
nation's fresh water.  Accurate information on the possible contamination
of groundwater by agro-chemicals is needed for intelligent planning for
agriculture activities by management, government, and industry.  Such
information can only be obtained by creating and testing various hypotheses
which attempt to describe chemical mobility.  This can be accomplished by
designing computer models which integrate and incorporate the latest
scientific knowledge allowing simulations of environmental impact by
pesticides by different mechanisms.  Among these mechanisms are physical,
management, crop growth, nutrient, soil chemistry, and pesticide processes.
The pesticide processes include many physicochemical relationships. In order to simulate these processes, complete and accurate data on the properties of each chemical used must be available for input into the model. Without complete and accurate data, the accuracy, and hence value, of the model predictions are severely reduced.
      At present there is no definitive database of properties of chemicals
used as pesticides.  There are a number of compilations of data on
pesticides, including handbooks which are either devoted completely or
partially to data on pesticides.  However, these sources are incomplete and
also lack criteria for defining the quality of the data reported.  In
addition, the data are often reported without mentioning the temperature at
which the parameters were obtained, nor the original source.  Table 1 shows
an example of the variation of solubility data for one particular pesticide
(Ref. 1).  In searching the literature we have uncovered numerous other
examples of such data quality problems (Ref. 2).  These databases also have
data gaps which limit their usefulness for many modeling activities.  This
ARS pesticide data research project is designed to produce a pesticide
database containing the highest quality data available with all of the
pertinent parameters addressing the problem at hand. 
APPROACH 
      A definitive database of pesticide properties is being created with a
well defined, step-wise systems approach.  The steps are as follows:
	1. Establish a Pesticide Properties Database (PPD) National Research
Project Coordination Team (NRPCT), in cooperation with the National
Agriculture Chemicals Association (NACA).  
	2. Design the database, search the literature, obtain data from
industry and other sources, and enter the data into the database
using an IBM PC and dBaseIII/dBaseIV.
	3. Evaluate the data (accuracy, quality, and completeness) according
to established or newly developed criteria.  Data will be evaluated
in a defined and (hopefully) objective manner, using a series of
computer-based expert systems which either exist or will be developed
for that purpose.
	4. Fill in the data gaps by calculating, where possible, missing
properties using existing or newly developed theoretical techniques
which have been validated for the various classes of compounds found
in the database.
	5. Disseminate the database at low cost through an organization
acceptable to all parties participating in the project.
	6. Maintain and update the database at the USDA, ARS Systems Research
Laboratory.
STATUS
	1. Establishment of a PPD National Research Project Coordination Team
(NRPCT), in cooperation with the National Agriculture Chemicals
Association (NACA).  
      A team of scientists from ARS, other government agencies, and
industry has been established to provide leadership and facilitate the team
coordination.  NACA has been instrumental in getting this team together. 
The team members consist of scientists at the Systems Research Laboratory
and the Pesticide Degradation Laboratory in Beltsville, MD, the Southeast
Watershed Research Laboratory in Tifton, GA, scientists from other
government agencies including USDA/SCS, USDA/ES, EPA, and USGS, and
scientists from universities and industry (from companies belonging to
NACA).  A representative from the International Union of Pure and Applied
Chemistry (IUPAC) Pesticide Commission will also be asked to serve on the
team.  IUPAC participation is important as part of the evaluation of the
database and the distribution of the database to the scientific community. 
The team will use the System Research Laboratory's Agricultural Systems
Research Resource (ASRR) computer conferencing software for continuing
discussions.  The conference "PESTICIDE DATABASE" has been established and
all interested parties are being invited to join the conference, which can
be accessed by direct dial, Telenet, or Arpanet.
      The PPD NRPCT will approve the plans submitted to the group by the
Team Leader and generally oversee the activities to assure that they are on
target, on time, and are supporting ARS objectives and other groundwater
needs of the USA.  When necessary, the PPD NRPCT will meet, although most
of the work is expected to be done via computer conferencing.  To date all
of the meetings have been at the NACA headquarters in downtown Washington
DC, while those unable to attend have been able to keep abreast of the
activities using the electronic teleconferencing computer system.
	2.  Design the database and search the literature, obtain data from
other sources, and enter the data into the database. 
      The first 50 compounds have been chosen and entered into the database
from a prioritized list of compounds being studied by the ARS.  This list
of chemicals is shown in Table 2. Many additional compounds are to be added
in the next year as data are received from members of NACA who have agreed
to voluntarily contribute data to the project.  The data being submitted by
industry to this project are the same as those provided to the states of
California and Arizona under those states' pesticide registration laws. 
The initial list of some 40 properties (of which about 12 are of highest
priority) which are to be included in the database have been chosen and the
information associated with each item has been defined.  This will include
chemical names, CAS Registry Number (CAS RN), and other identifiers, and
variables (temperature, pressure, pH, and so forth), as well as important
chemical properties such as solubility, vapor pressure, various partition
coefficients, and dissipation rate constants.  Details on the data
elements, format, and access to the ARS ASRR computer system are available
upon request from the senior author (SRH).
dBase Advantages	
      The dBaseIII/dBaseIV computer database management system, coupled
with the Clipper software, which allows for the compilation of dBase code,
has been chosen for implementation of data entry, updating, search, and
retrieval.  The major reasons for the choice of dBaseIII/dBaseIV is that it
is widely used and by using Clipper we are able to provide a royalty-free
search system for IBM PC compatible computers.  This was the critical
factor in deciding which software to use.  Other software would have
required royalty fees and would have involved other non-technical
complications which would have made the world-wide dissemination of the PPD
more difficult.  For example, the encryption aspects of the dBase software
would have severely limited the ability to distribute the database to many
countries.  This same conclusion was recently reached by the International
Union of Pure and Applied Chemistry (IUPAC) Committee on Chemical Database
(CCDB) and their experiences in using dBase has been satisfactory.
dBase Disadvantages
      As with any software package, dBase has a number of limitations when
applied to scientific databases and systems.  The number of fields
available is limited so that a number of files must be set up to handle all
the data elements.  Having fixed field lengths is justified for
administrative systems, but not for scientific data.  For example, a good
deal of disk space is wasted when the chemical name field length must be
set using the longest chemical name available.  Entering data using the
dBase numeric field is necessary for searching, but creates a problem in
printouts.  If a parameter field is empty, the dBase numeric default is 0
(zero).  Thus it is possible for someone easily to misinterpret a blank as
a data value.  Conversion of the data to character information for printout
is a solution, but clearly involves additional time and storage.
	3. Evaluate the data for accuracy, quality, and completeness
according to established or newly developed criteria. The first of a
series of computer based expert systems has been developed for the
evaluation of aqueous solubility data.  The expert system, called
SOL, is an IBM PC based program, written using the NASA public domain
CLIPS software.  SOL is currently being tested by a dozen groups in
government, academia, and industry.  Arrangements are being made to
distribute the SOL program through the National Institute of
Standards and Technology (NIST, formerly known as the National Bureau
of Standards (NBS)), Office of Standard Reference Data.
	4.  As the database is still being established, it is too early to
use or to work on property estimation techniques for supplying values
for missing data values.  As the current state-of-the-art in property
prediction for complex chemicals (such as found in the ARS PPD) is in
its infancy, it is probably better to wait for the field to develop
further before much of an effort is made in this area to use these
programs.
	5. The first release of the database in expected in mid-1989.  A
preliminary database of some 50 chemicals has been distributed to PPD
NRPCT and NACA members.  As most of the data comes from NACA
industrial members, it has been agreed that they will have the
opportunity to review their data before it is released.
	6. A formal commitment to maintain and update the database was made
late in 1988 by senior ARS management.  This commitment has helped
convince industry to expand their voluntary cooperation and will
assure the project of a continuing source of data.
SUMMARY
      To date, most of the effort has been expended in searching the
literature, obtaining data from industry, and other sources.  Cooperation
from industry has been essential to the project's progress.  The need for
data evaluation has been pointed up most strikingly by data on the
solubility of many pesticides.  In one particular case, it has been found
that for almost 30 years the solubility of fenthion has been reported in
handbooks as about 55 ppm, whereas the correct solubility (run under proper
experimental conditions) is actually 4.2 ppm.  This difference of a factor
of almost 15 is too great to be ignored.  
      After a year of work on the first evaluated database of pesticide
properties we feel that the goals of the project can be met, but that more
time than originally anticipated will be required, as the available data
are more difficult to obtain and are less reliable than first expected.  An
interim database of values obtained will be our first goal, followed
closely by proper evaluation of the database, with appropriate data quality
indicator ratings.  
REFERENCES
1. Heller, S. R., Scott, K., and Bigwood, D. W., "The Need for Data
Evaluation of Physical and Chemical Properties of Pesticides", J. Chem.
Info. Comput. Sci., submitted.
2. Heller, S. R., Bigwood, D. W., Schantz, M.,  Guenther, F. and May, W.
E., "Expert  Systems  for Analytical Chemistry Properties - Aqueous
Solubility", in preparation.
| Table 1 - Chronological Order of Reported Solubility Values in units of mg/L for Fenthion | 
| Std. Temperature Experimental Reference Name | 
| Value Dev. oC Conditions | 
| 54-56 Room Temperature No Schrader 54-56 Room Temperature No Gunther | 
| 54-56 Room Temperature No Spencer | 
| 55 Room Temperature No Verschueren | 
| 54-56 Room Temperature No Worthing/6th | 
| 56 22 No Khan | 
| 54 None No Eto | 
| 55 None No Merck Index | 
| 7.51 0.31 20 Yes Bowman & Sans | 
| 2 20 No Worthing/7th | 
| 6.4 10 Yes Bowman & Sans | 
| 9.3 20 Yes Bowman & Sans | 
| 11.3 30 Yes Bowman & Sans | 
| 54-56 Room Temperature No Agrochem Hbk. | 
| 4.2 0.19 20 Yes Mobay Report | 
| 50 20 No Mackay et. al. | 
| 50 20 No Farm Chem Hbk. | 
Room Temperature was not defined in any reference, but is usually assumed
to be from 15-25 oC.
The time frame for the references runs from 1960 to 1988.
| Table 2 - Pesticides Currently in the ARS PPD Database | |
| Compound Name CASRN | |
| Alachlor 15972-60-8 | |
| Aldicarb 116-06-3 | |
| Ametryn 834-12-8 | |
| Amitrole 61-82-5 | |
| Anilazine 101-05-3 | |
| Sodium Asulam 3337-71-1 | |
| Atrazine 1912-24-9 | |
| Methyl Azinphos 86-50-0 | |
| Bromacil 314-40-9 | |
| Bromoxynil Butyrate 3861-41-4 | |
| Bromoxynil Octanoate 1689-99-2 | |
| Butifos 78-48-8 | |
| Carbaryl 63-25-2 | |
| Carbofuran 1563-66-2 | |
| Chlordimeform 6164-98-3 | |
| Chlorobenzilate 510-15-6 | |
| Chloroxuron 1982-47-4 | |
| Cyanazine 21725-46-2 | |
| Cyfluthrin 68359-37-5 | |
| Cypermethrin 52315-07-8 | |
| Cyromazine 66215-27-8 | |
| DCPA 1861-32-1 | |
| Diazinon 333-41-5 | |
| Dipropetryn 4147-51-7 | |
| Disulfoton 298-04-4 | |
| Endosulfan 115-29-7 | |
| Ethion 563-12-2 | |
| Fenthion 55-38-9 | |
| Fosetyl-al 39148-24-8 | |
| Isofenphos 25311-71-1 | |
| Metalaxyl 57837-19-1 | |
| Methamidophos 10265-92-6 | |
| Methidathion 950-37-8 | |
| Methiocarb 2032-65-7 | |
| Methyl Oxydemetron 301-12-2 | |
| Permethrin 52645-53-1 | |
| Phenamiphos 22224-92-6 | |
| Phosphamidon 13171-21-6 | |
| Profenofos 41198-08-7 | |
| Prometon 1610-18-0 | |
| Prometryn 7287-19-6 | |
| Propoxur 114-26-1 | |
| Quinomethionate 2439-01-2 | |
| Sulprofos 35400-43-2 | |
| Terbutryn 886-50-0 | |
| Triadimefon 43121-43-3 | |
| Trichlorfon 52-68-6 | |
| Iprodione 36734-19-7 | |
| Phosalone 2310-17-0 | |