Expert Systems for Evaluating Physico-Chemical Property Values

Expert Systems for Evaluating Physico-Chemical Property Values -
1. Aqueous Solubility

Stephen R. Heller *,
USDA, ARS, NPS
Beltsville, MD 20705-2350
SRHELLER@ASRR.ARSUSDA.GOV

Douglas W. Bigwood,
USDA, NAL
Beltsville, MD 20705-2351

and

Willie E. May
NIST
Gaithersburg, MD 20899

* Author to whom correspondence should be addressed

Abstract

Providing consistent data evaluation is critical to scientific studies. An expert system for evaluating the efficacy of the reported methodology for determining aqueous solubility is described and compared with two other similar manual data quality evaluation systems. The expert system, SOL, is a post-peer review filter for data evaluation. SOL has been designed to run on any IBM-PC compatible computer using the CLIPS public domain expert system shell.

Introduction

Data quality is an important issue frequently underemphasized in scientific publications. Without accurate data, proper decisions cannot be made. With the current information explosion, more and more data are being reported. A thorough evaluation of the quality of the data is often slighted in the rush to publish. In the past it was possible to either check a few journals or check with a few colleagues to find the specific data required for a project. Today, that is no longer the case since the scientific community has become much larger and the number and diversity of compounds that people are studying has become much greater (1). The work described here is an attempt to define the quality of published data and provide a way to improve the quality of data that will be published in the future.

As Lide has pointed out (1), CODATA, the Committee on Data for Science and Technology of the International Council of Scientific Unions, has proposed a three category scheme for classifying scientific and technical data. Class A data are repeatable measurements on well-defined systems. This would include data which are subject to verification by repeating the measurements in different labs at different times. Class B data are observational data, which are dependent on such variables as time, or in the case of agrochemical studies, they are dependent on the nature of the soil in a particular location. Often they cannot be repeated, and vary from one lab to another. Class C data are statistical data, which would include health data, demographic data, and so on. For class A data it should be possible to devise an acceptable method for evaluation of the data. That is, it should be possible to provide experimental evidence that the data are acceptable.

As concern over the quality of the nation's groundwater has increased in the past few years, the U.S. Agricultural Research Service (ARS) has been charged with the task of trying to model or predict the effects of pesticide contamination of groundwater. With the increased emphasis on the use of simulation and modeling, it is critical that accurate data be used. The parameters required for developing models of groundwater contamination include a number of fundamental chemical and physical properties of pesticides: aqueous solubility, vapor pressure, water/octanol partition coefficients, hydrolysis and photolysis rate constants, Henry's law constant, soil absorption, and so forth. We believe that all these types of data, except soil absorption data fall within the CODATA definition of Class A data. Aqueous solubility is certainly an essential property for studies in this field. This is because pesticide solubility plays a major role in determining how much of a chemical can be dissolved and carried into the groundwater. Providing consistent and objective evaluation of physico-chemical data is critical for both planning future physico-chemical studies in addition to the effective use of data in modeling, assessments, and other activities.

As a search of literature containing aqueous solubility data was undertaken, it became clear very quickly that the variations in reported results were much greater than the expected scientific measurement errors could justify. Examples of the data variability for the aqueous solubility of representative pesticides can be seen from the plots of log solubility vs. 1/Temperature (K) in Figure 1a Figure 1b Figure 1c Figure 1d Figure 1e Figure 1f. The plots of log (base 10) solubility vs. the inverse of the temperature (in Kelvin) should yield a straight line, but not a vertical straight line (such as one might consider drawing based on the predominance of the data from fenthion (figure 1a)). The data in Figure 1a Figure 1b Figure 1c Figure 1d Figure 1e and Figure 1f. are labeled as to their source (e.g., a handbook, or the scientific literature, and in the case of the second plot in figure 1e the data measurements using squares represent data from the literature) and their rating using the SOL expert system. However, such labeling seems to have little significance since most of the values used come from handbooks and the literature without reference or proper experimental information, and hence the rating of "U" or unknown quality (described later in the text). In addition to the values shown in (figure 1a)). The data in Figure 1a Figure 1b Figure 1c Figure 1d Figure 1e and Figure 1f, there were other values from the scientific literature that had no specific temperature stated for the solubility reported and hence could not be plotted. What can be seen from these examples is that these plots clearly indicate the overall inconsistency among the quality of the data reported in the literature and in handbooks, as well as the lack of quality control of data reported in these sources.

Fenthion was the first pesticide that we examined closely for the quality of reported aqueous solubility data (2). Fenthion was introduced in 1957 and is used as an insecticide, primarily for ornamental plants. In this study, we found that most of the aqueous solubility values in the latest editions of handbooks contained old and dubious values, all without reference to the source of the information. In general, further scrutiny of the values reported for aqueous solubility of pesticides made it clear that for any credibility to be given to groundwater modeling and chemical assessment efforts, the input data must be reliable.

We do not use the mean value or the value most frequently reported in the literature (primarily journals and handbooks), since it is not clear that the numbers are from different experiments. Furthermore, reported values can cluster around multiple values, as seen in Figure 1e. There are 8 values for the solubility of atrazine at 20^oC - 27^oC clustered about 30 mg/L or ppm (1.4 10^-04 mole/L), in the same temperature range there are 4 values clustered at 70 mg/L or ppm (3.2 10^-04 mole/L). Therefore a majority vote for the "best value" should not be considered an acceptable approach for evaluating solubility data.

Providing consistent and objective evaluation of physico-chemical studies is critical for planning future similar studies and effectively using data in modeling, assessments, and other activities. Since the current collection of solubility data was less than credible, it was desirable to develop a method to evaluate, to the extent possible, the quality of the data which has been reported. Lacking the expertise in the analytical chemistry area of solubility we (SRH and DWB), the ARS scientists, initiated a joint project between the ARS and the National Institute for Standards and Technology (NIST), formerly the National Bureau of Standards (NBS), Organic Analytical Research Division. The basic assumption in the critical evaluation of aqueous solubility data, using an expert system approach, is that sound methods should give good data. While a good evaluation system cannot rectify a poor measurement, it can assess the way the measurements were made and allow the users of this data to be aware of its limitations and/or problems. The experience gained from the development of the SELEX Expert System for evaluating published data on selenium in foods (3) led to the rapid development of the ARS SOL aqueous solubility expert system described in this paper.

Data Quality Criteria

The first step in developing the Solubility Expert System, SOL, was to decide which criteria would be used in the evaluation scheme. Other criteria schemes for evaluating solubility have been developed by Yalkowsky (4) (Table 1) and Kollig (5) (Table 2). The works of Yalkowsky and Kollig provided a foundation on which this work is based. We produced a list of important factors that should be considered in a laboratory which plans to generate high quality aqueous solubility data. The final criteria were chosen from an analysis of the two existing schemes plus additional criteria which we felt to be most important and practical for the task at hand. Practicality was a major consideration in the final determination of the criteria that would be included in the SOL expert system. We feel that SOL is a robust system, but we chose to ask only the most important questions and deleted any questions which we felt would not effect the final results of the data quality rating.

The Yalkowsky scheme, shown in Table 1 , rates the quality with which the data is reported by assessing points and averaging the five criteria, provided an excellent starting point. In this scheme, all five criteria are weighted equally. However, when we looked into programming the scheme, a number of difficulties were discovered regarding the specificity of the criteria. The primary difficulty was felt to be that this scheme was not able to provide for sufficient differentiating of data quality. This is due to the broad definitions of rating categories for a given parameter. For example, in Yalkowsky's scheme the parameter of purity gives the same rating to a compound of 50%, 90%, and 99.5% purity, so long as the range of the purity is given. The five categories used in the Yalkowsky scheme are also in the ARS/NIST scheme (Figures 2a and 2b). Yalkowsky does not prioritize or weight his data quality parameters, but rather provides an equal weighting to all criteria. The fundamental difference between Yalkowsky and the ARS/NIST scheme presented here is that we use a weighting factor for all criteria or parameters.

The Yalkowsky solubility database (6,7) is the most comprehensive database of solubility data. Being comprehensive, Yalkowsky has chosen to have the user decide which data to use. There is little critical evaluation of the data, and values are listed for aqueous solubility which, in some cases, are highly questionable (e.g., see fenthion, Figure 1a). The database does have the considerable benefit of referencing all reported values, but many are from handbooks or other compilations which do not refer back to the original literature. Therefore, it is not possible to check the validity of the value or learn about the experimental conditions used to determine the reported value. For example, solubility values where the temperature is not stated (i.e., either explicitly not stated or stated as room temperature) was considered not worth entering into our database. The data which Yalkowsky extracts directly from literature citations may be useful as part of the regulatory process, but are not always internally consistent. For example, listing the solubility as "insoluble", "almost insoluble", "moderately soluble", "nearly insoluble", "practically insoluble", "very soluble", "very slightly soluble", "sparingly soluble", or "soluble", was felt to be too subjective. Certainly, if a student in a quantitative analytical chemistry course reported a solubility using one of the terms in the previous sentence, it is not likely the student would receive a passing grade for that experiment. However, as demonstrated by the data collected by Yalkowsky, data compilations are blindly accepted by some groups in the scientific community. With the recent public concern about the quality of scientific studies (8), an expert system such as SOL has the potential to help alleviate the concerns about good scientific experiments and help reduce cases of blind acceptance of data results. One colleague has even suggested that such "expert systems could become codes of experimental procedure and documentation much more usable and complete than written manuals (9)". While Yalkowsky makes the reader chose which solubility value to use, the ARS/NIST criteria provides the reader with more information, and effectively makes the choice for the user as to which solubility value to use.

For our database and evaluation scheme we decided that, without a stated temperature, the aqueous solubility value should not be accepted. Reporting the temperature as "room temperature" or "ambient" has different meaning to different people. Such terminology is not very specific since room temperature covers a range of more than ten degrees celsius. For example, the solubilities of a few insecticides from the work of Bowman and Sans (9), shown in Table 4, reveal considerable variations in solubility over the temperature range from 20^oC to 25^oC, often referred to as "room temperature".

In addition, the purity of the solute must be at least 95% and the actual value for purity must be known if the information is to be useful. While a purity of less than 95% may be acceptable for some purposes, such as those found in regulatory agencies where the actual commercial formulations are evaluated, higher purity should be required for a scientific database of reference data. This is primarily due to the extreme bias impurities introduce. In a recent study (11) it was found that a small amount of the more soluble phenanthrene as an impurity in anthracene caused the apparent solubility of anthracene to be measured as 0.075 mg/mL instead of 0.045 mg/mL. The correct value is obtained when ultra-high purity anthracene is used, or when techniques ar employed that isolate the anthracene signal from phenanthrene are employed, such as fluorescence (12) (for optical resolution) or liquid chromatography (for separation in time; i.e. time resolution).

Yalkowsky's evaluation for the process involved in generating saturated solutions (equilibrium - agitation time), is good for any evaluation scheme. In our scheme an acceptable analytical method must be given. For the "analysis" parameter, we believe that unless the analytical method is stated, the data have insufficient merit to be used. Lastly, for the "accuracy and precision" parameter we felt that the number of significant figures did not provide the differentiation of the absolute value of the aqueous solubility. It did not provide sufficient differentiation between errors in solubility values in the parts per million (ppm) and higher range as compared to the parts per billion (ppb) range. In the latter case (ppb) larger percentage errors are the norm since one is measuring values towards the lower limit of instrument sensitivity. The Yalkowsky scheme does not allow for the fact that state-of-the-art experimental precision may still give a large relative error when a very low solubility is measured. For example, we feel that reporting a value of 10 ppb with an error of 2 ppb is quite acceptable. This would be given a rating of "2" in our system (on a scale of 1-4, with 1 being the highest possible rating) while 10 ppm with an error of 2 ppm is less acceptable, and would be given a rating of "4" (the lowest rating) in our scheme. (Furthermore, a rating of "4" at this point would end the questioning and exit the user from the SOL system with the final rating value being "4".) In the Yalkowsky scheme both would be given a zero, the lowest rating for that parameter.

The Kollig solubility evaluation scheme (5), shown in Table2 , is part of a larger plan which discusses criteria for evaluating the reliability of data on 12 environmental parameters. Only the criteria for the evaluation of aqueous solubility are included in Table 2. The Kollig evaluation criteria are extensive, having 31 general questions about the experiment and 6 specific questions about the experimental data for solubility. Very few, if any, published reports are detailed enough to answer all of the questions. Even if it were possible to do so, it would be impractical to expect anyone to spend the necessary time answering all these questions in an evaluation scheme. While the Yalkowsky scheme seems rudimentary, the Kollig scheme seems to include more questions that are absolutely necessary to determine data quality. In addition, the Kollig scheme does not have a numerical score. We believe that it is unnecessary to answer all 37 questions to obtain an unambiguous decision on the quality of the data. It is unlikely that anyone would use a rating scheme which required so much time and effort. In addition, some of the questions are too ambiguous to be answered reproducibly by different analytical chemists. For example, any answer to the question "Could you repeat the analytical part with the available information?" is subjective without work on the part of the reader. Other questions are not relevant to the quality of the data. For example, the question "Is a clear objective stated?", has little connection with data quality, but rather probably relates to why a given level of accuracy was acceptable for that experiment. The question "Was the paper peer-reviewed?" may not be meaningful with regard to data quality. There are a number of reasons for this. One is that years after the paper was published part or all of the methodology used may have been discovered to be unreliable. Another reason is that the thrust of the paper may not have had to do with a specific physical or chemical parameter, and hence the reviewer may have not examined or reviewed specific data reported in the paper. It is not clear how one could readily answer the question "Do independent lab data confirm results?", as agreement with other studies are not usually presented. If there are no estimated data or an acceptable method for estimation, the question of "Do estimated data confirm results?", cannot be answered. Lastly, for solubility data, asking "Was the chemical studied at a minimum of three temperatures?" would eliminate the vast majority of the reported data.

Analyzing the Yalkowsky and Kollig strategies led to the creation of an evaluation scheme with the seven parameters shown in Table 5 and described in detail in Figures 2a and Figure 2b. The foremost considerations in developing the scheme were that the scheme had to give a reasonable answer (that is, one which a researcher in the field would reasonably expect) and the system had to be easy to use. In addition, the method was ordered so as to ask the minimum number of questions in order to get a realistic data quality rating. That is, not all questions need be asked if an early answer in the sequence of questions produces the same answer equivalent to going through the entire sequence of questions. For example, if an early answer gives a "2" rating and subsequent responses would not raise the rating, the program terminates. If a critical question, such as solute purity or temperature cannot be answered, then it was also deemed unnecessary to ask any further questions. We base the rating in our scheme on the lowest rating for any of the seven criteria and do not average the sum of all the criteria as is the case in the Yalkowsky scheme.

The ease of use, credibility of the experts who established the criteria, acceptability, and practicality of the scheme are considered critical if the scheme is to be accepted and used. Thus, additional questions which did not change the rating of a solubility value, were eliminated from the system. Furthermore, as described above, questioning terminates as soon as the ultimate rating is determined. The resulting questions are considered the minimum needed to obtain an objective and reproducible rating value. While these criteria of the SOL expert system may or may not be better than other published criteria, it is felt that they are clearly more practical.

Expert System Development

Once the criteria were selected, a computer system (of approximately 40 rules) concerned with the seven factors listed in Table 5, was created to evaluate the quality of reported aqueous solubility data. The system used was the NASA-developed C Language Interfaceable Production System, known as CLIPS (12) - a public domain expert system shell (13). CLIPS is written in the C programming language and is basically a forward chaining rule-based system based on the Rete pattern-matching algorithm (14). Having the program in the C language makes it portable to other computer systems. In fact, the CLIPS expert system shell now runs on the IBM compatible PC computers, the Apple Macintosh computer series, as well as the DEC VAX series of computers.

The evaluation scheme is based on approximately 40 rules which are concerned with seven general categories for its rule-making process, as shown in Table 3. The categories are given in the priority order which we think are of scientific importance in determining the quality of aqueous solubility data. These criteria were ordered into a decision tree structure. The way in which the first four SOL rules used in the expert system were programmed in CLIPS is shown in Figure 3. The purpose of this table is to show how uncomplicated the rules are actually written. One can also see how a rule is a simple set of a few lines of computer code. We hope this table will dispel any notion the reader might have that expert systems must be sophisticated and complex computer coded programs.

Clearly, on one hand the order of the seven categories in Table 5 are subjective and the rating value decisions in Figures 2a and 2b for each category, or question, are subjective. (For example, the decision to rate 99% purity of the solute as the highest, rather than using 98% is subjective.) However subjective some researchers may find these criteria and decisions, the ratings are believed to be defined properly enough and clearly enough so that they are reproducible. (Thus, again for discussion purposes, once we have chosen 99% or higher purity for the solute being needed to give the highest rating, anyone using the system and entering a purity of 99% or greater will get the same rating result.) In addition, the criteria are objective in that they follow a scientific method of development and analysis. The results of the solubility evaluation range from a rating of "1" (the best) to a rating of "U" (unevaluatable), which means the experimental information provided in the publication, or source, is not sufficient to undertake an evaluation. The ratings and their meaning are as follows:

1 - Highest rating. Experimental method of high quality. Not many data values are expected to meet this high level of quality and few data will get this score.

2 - Good rating. Some parts of the experimental method were below the highest standards. Many experiments published in the literature and elsewhere will meet these criteria.

3 - Acceptable rating. Experimental methods were all defined, but work was performed or reported at a minimal scientific level. Many good experimental values will fall in this rating category, owing to poor reporting, either from a lack of space in the journal, or the secondary nature of the solubility data in the particular reference.

4 - Poor rating. Experimental method was given for all parts of the experiment, but the methods or values indicated poor experimental procedures. Some of the older studies fall in this category, as more recent analytical chemistry has shown some problems with older techniques. In some cases, researchers, often not analytical chemists or appropriately trained, did not undertake the solubility measurements as correctly as needed.

U - Unevaluatable. There is insufficient data/information to evaluate the numerical value presented. U could also stand for unknown, but it was felt the word unevaluatable gets across the point of poor experimental data reporting in a more forceful manner.

A number of published solubility studies were used to evaluate the rating scheme. The SOL expert system provided results which are consistent with the subjective opinions of the solubility experts at NIST and ARS who have tested the SOL program. For example, the Pesticide Manual (15), Merck Index (16), and the Agrochemical Handbook (17) do not publish references or methodologies, and therefore all the aqueous solubility data from these three sources are rated "U".

The expert system, called SOL, is an IBM-PC based computer program, which requires 256K of memory and has a total system storage requirement of less than 300,000 bytes of total disk storage (which allows it to fit on a low density floppy disk). Help messages have been written to assist users at every step of the evaluation scheme. A typical help message is shown in Figure 4. The SOL expert system program creates two disk files to keep track of the results. One file contains the final SOL expert system rating. The second file contains all the answers a user has entered in response to the questions, so it is possible to repeat an evaluation to see if similar answers were given at different times or different people gave different answers. The system is both easy to use and easy to update with new rules. However, in order to assure consistency in ratings, the system is released only in a computer executable version (with no source code). Should, in the future, changes need to be made in the SOL system, these will be made and the new executable version distributed to all users of SOL.

The system can be run from either a floppy or hard disk. Routinely it is run from 80X86 cpu desk-top computers, as well as from portable computers. The response time for answering each question is effectively instantaneous on all computers. The SOL expert system is available upon request from the USDA (19).

System Testing

In order to test the SOL expert system, we undertook both some experimental work and analyzed some existing literature studies which we felt were of high quality. For the experimental solubilities we measured the solubility of Chloropyrifos and Bromocil (57). The results of these are shown in Figures 1e and 1f respectively. We believe these results, which lead to high ratings in our evaluation process, show that it is possible to obtain good solubility data by using and reporting good methodology.

Secondly, we analyzed a number of published reports on aqueous solubility data for three hydrocarbons and one PCB. In each case the literature citation was obtained and one of the authors went through the SOL expert system with the literature reference in hand and answered the questions from the system. This was repeated for all 22 solubility measurements found in the 15 literature citations (19-33). These results are shown in Table 5. The ratings are what were expected by the reviewer based on their subjective opinions concerning the quality of the solubility studies. The results of this table show that the values which have been assigned high ratings (2's and 3's) are consistent with reviewers opinion concerning the data quality. The SOL expert system also is able to screen out erratic values, such as the 1.29 10^-08 and 6.34 10^-08 values for the solubility of PCB 101. The fact that some apparently good or correct values have been given low ratings is due to the lack of experimental detail and information reported. Thus, while some good data may be rated lower, it does not appear that any reliably reported bad data has been given a higher rating than it deserves. In a perfect evaluation system this would not occur, but without the needed experimental information, correct and honest documentation of the actual experimental work performed, which must be available when the SOL expert system is used, it is not possible to achieve perfection, let alone accurate results.

Summary

An easy to use and reproducible data evaluation scheme for aqueous solubility has been developed. The scheme has been implemented as an expert system, SOL, for IBM PC based computers. The SOL program is considered to be user friendly and requires a minimum number of questions to provide the user with a rating for the particular solubility study being examined. In testing the SOL software, the ratings which the system reports are in agreement with qualitative opinions of the experts who examined the various articles.

Additional expert systems for evaluating other CODATA Class A physical and chemical parameters are being developed, vapor pressure being the next property under study, and will be used in the evaluation of data for the ARS Pesticide Properties Database (PPD). It is hoped that other solubility data evaluation projects will also make use of the SOL program.

Acknowledgements

The authors wish to thank Michele M. Schantz and Franklin Guenther, NIST for their experimental work in support of this project. The authors also wish to thank Sam Yalkowsky for his valuable thoughts and comments on solubility data evaluation and Karen Scott for her assistance in the PPD data collection activities.

Figure captions for Figures 1a - 1d:
Title: Examples of Data Variability - Solubility Values are from the ARS PPD or the Yalkowsky SOLUB database. The data are plotted with the y axis being the log (base 10) of the solubility in mg/L and the x axis being 1/Temperature in Kelvin (multiplied by 10-3).

Figure 1e
Title: Comparisons of solubility data from the literature and from NIST experiments for Chloropyrifos.

Figure 1f

Solubility data from NIST experiments for Bromocil

Footnote to Figures 1a-1f:

The numbers in parenthesis next to each point refer to the source (literature reference) of the data followed by the SOL rating value.

Table 2

Kollig Evaluation Scheme

Evaluation Criteria (Weighted 1-5)

Analytical Information

1. Is analytical method recognized as an acceptable method? (4)
2. Could you repeat the analytical part with the available information? (5)
3. Was the chemical analyzed within the linear range of the instrumentation? (3)
4. Is the detection limit stated? (2)
5. Was either HPLC or GC used? (4)
6. Is extraction efficiency stated? (3)

(Ignored if HPLC was used and extraction was not done)
7. Was high purity solvent used for extractions? (3)

(Ignored if extraction was not done)
8. Were product interferences checked for? (2)

Experimental Information

1. Could you repeat the experiment with available information? (5)
2. Is a clear objective stated? (1)
3. Is water quality characterized or identified? (2)

(Distilled or deionized)
4. Are the results presented in detail, clearly and understandably? (3)
5. Are the data from a primary source and not from a referenced article? (3)
6. Was the chemical tested at concentrations below its water solubility? (5)
7. Were particulates absent? (2)
8. Was a reference chemical of known constant tested? (3)
9. Were other fate processes considered? (5)
10. Was a control (blank) run? (3)
11. Was temperature kept constant? (5)
12. Was the experiment done near room temperature (15-30 ^oC)? (3)
13. Is the purity of the test chemical reported (> 98%) ? (3)
14. Was the chemical's identity proven? (3)
15. Is the source of the chemical reported? (1)

Statistical Information
1. Were replicate samples analyzed? (3)
2. Were replicate sample systems run? (3)
3. Is precision of analytical technique reported? (3)
4. Is precision of sample analysis reported? (3)
5. Is precision of replicate sample systems reported? (3)

Corroborative Information

1. Was paper peer-reviewed? (5)
2. Do independent lab data confirm results? (3)
3. Do estimated data confirm results? (2)

Specific Additional Information for Solubility Data

1. Was final equilibrium shown over time? (5)
2. Was the reaction vessel capped? (3)
3. Was the sample kept in a constant temperature bath? (4)
4. Was a thermostated centrifuge used at the same temperature? (4)
5. Was stability of the chemical in water shown? (5)
6. Was the chemical studied at a minimum of three temperatures? (4)

Figure 3

Sample SOL rules as programmed using the CLIPS Expert System Shell program

;;;
;;; Start of questions
;;;

(defrule solute-purity
(analyzing solubility)
(not (solute-purity))
=>
(printout t crlf "Purity of solute")
(printout t crlf crlf "a. 99% <= Purity < 100%"
crlf "b. 95% <= Purity < 99%"
crlf "c. Purity < 95%"
crlf "d. Purity not stated")
(printout t crlf crlf "Enter purity (a-d): ")
(assert (solute-purity =(read 1010 a d))))

(defrule water-purity
(analyzing solubility)
(solute-purity ?)
=>
(printout t crlf "Purity of Water Used in Experimental Procedure"
crlf crlf "a. HPLC grade (organic free)"
crlf "b. distilled/demineralized"
crlf "c. demineralized"
crlf "d. distilled"
crlf "e. none of the above"
crlf crlf "Enter purity (1-5): ")
(assert (water-purity =(read 1020 a e))))

(defrule temperature-stated
(analyzing solubility)
(water-purity ?)
=>
(if (y-or-n-p 1030 0 "Was the temperature stated")
then (assert (temperature-stated true))
else (assert (temperature-stated false))))

(defrule temperature-controlled
(analyzing solubility)
(temperature-stated true)
=>
(if (y-or-n-p 1040 0 "Was the temperature controlled?")
then (assert (temperature-controlled true))
else (assert (temperature-controlled false))))

(defrule temperature-corrected-for
(analyzing solubility)
(temperature-controlled false)
=>
(if (y-or-n-p 1050 0 "Was the solubility plotted vs. temperature or otherwise corrected for")
then (assert (temperature-corrected true))
else (assert (temperature-corrected false))))

Figure 4

HELP11 - Mean Value for the Solubility

This question is asked to decide what rating will be given to the standard error reported. Solubility values greater than 1 ppm should be reported to two (2) significant figures. If the mean solubility value is greater than 1 ppm (answer 1) and there are less than two significant figures,( i.e., a "no" answer is given to the question asking if there are at least two significant figures in the mean value of the solubility), the solubility is given a "4" rating. If the answer is "yes", the program then continues on to ask for the standard error (described in the next paragraph).

If the answer is category 2 or 3 (solubility less than 1 ppm or solubility less than 100 ppb), you will then be asked for the standard error for the solubility value as a percentage of the mean value. The standard error is a number from 0 to 100.

The Standard Error Rating Table used by the SOL expert system is shown below:

Rating

	1	2	3
Category Answer (Concentration Range)	Maximum for a	Percent Standard particular rating	Error
a	5	10	20
b	10	20	30
c	10	25	50

Thus, for example, if the reported solubility is 10 ppm (category a1), then a 10% error yields a "2" rating. A 10% error for a solubility value of 10 ppb (category c) yields a "1" rating. A 20% error for a solubility value of 200 ppb (category b) yields a "2" rating, but would yield a "4" rating if the solubility value was 100 ppm.

Table 5 Factors considered in the ARS/NIST Aqueous Solubility Data Evaluation Scheme

1. solute purity
2. water purity
3. temperature/temperature control
4. polarity of solute
5. precision
6. accuracy
7. saturated solution methodology

References

1. Lide, D., Critical Data for Critical Needs, Science, 212, 1343-49 (1981).

2. Heller, S. R., Scott, K., and Bigwood, D. W., "The Need for Data Evaluation of Physical and Chemical Properties of Pesticides", J. Chem. Info. Comput. Sci., 29, 159-162 (1989).

3. Bigwood, D. W., Heller, S. R., Wolf, W. R., Schubert, A., and Holden, J., "SELEX: An Expert System for Evaluating Published Data on Selenium in Foods", Anal. Chim. Acta, 200, pages 411-419 (1987).

4. Yalkowsky, S., Presentation at "The Estimation of Physical Data for Organic Compounds", Beilstein Workshop, Bolzano, Italy, 16-20 May, 1988.

5. Kollig, H. P., "Criteria for Evaluating the Reliability of Literature Data on Environmental Process Constants", Tox. Envirn. Chem., 17, 287-311 (1988).

6. Adb, the ARIZONA dATAbASE of Aqueous Solubility, College of Pharmacy, University of Arizona, Tucson, AZ 85721.

7. See, for example, Science, 244, page 765, 19 May 1989 and pages 642-646, 12 May 1989.

8. Personal communication, 5 June 1989, from S. Pacenka, Cornell University, New York State Water Resources Institute, Ithaca, NY 14853-3501.

9. Bowman, B. T., and Sans, W. W., "Effect of Temperature on the Water Solubility of Insecticides", J. Envirn. Sci. Health, B20(2), pages 625-631 (1985).

10. May, W. E., Wasik, S. P., and Freeman, D. H., "Determination of the Aqueous Solubility of Polynuclear Aromatic Hydrocarbons by a Coupled-Column Liquid Chromatographic Technique, Anal. Chem., 50, 1 (1978).

11. Schwarz, F. J., "Determination of Temperature Dependence of Solubilities of Polycyclic Aromatic Hydrocarbons in Aqueous Solutions by a Fluorescence Method", J. Chem. Eng. Data, 22, 273-277 (1977).

12. CLIPS, Catalog # MSC-21208, is available for the IBM PC, Macintosh, and VAX computers from the NASA COSMIC Software Catalog, 1988 Edition, 382 East Broad Street, Athens, GA 30602. The cost is $250 for the software and $62 for the documentation.

13. Raeth, P. G., "Two PC-based Expert System Shells for the First-time Developer", Computer, 73-81, November 1988.

14. Bridgeland, D. and Lafferty, L., "Scavenger: An Experimental Rete Compiler", Proceedings of the International Society for Optical Engineering, 635, 487-496 (1986), Bellingham, WA.

15. "The Pesticide Manual, A World Compendium", 7th edition, Edited by Worthing, C. R., and Walker, S. B., The British Crop Protection Council, 144-150 London Road, Croydon CR0 2TD, England, 1987.

16. "The Merck Index", 10th Edition, G. Delko, Editor, Merck & Co., Inc., Rahway, NJ 07065-0900, 1983.

17. Royal Society of Chemistry, "The Agrochemicals Handbook", 2nd Edition, Royal Society of Chemistry, The University, Nottingham NG7 2RD, England, 1987.

18. Requests for the IBM PC version should be addressed to: Office of Genome Mapping, USDA, ARS, Bldg. 005, Room 333, Beltsville, MD 20705. It is available only on a 5 1/4 inch floppy disk.

19. Tewari, Y.B., Miller, M.M., Wasik, S.P., and Martire, D.E., Aqueous Solubility and Octanol/Water Partition Coefficient of Organic Compounds at 25.0C, J. Chem. Eng. Data, 27, 451-454 (1982).

20. McAuliffe, C., Solubility in Water of Paraffin, Cycloparaffin, Olefin, Acetylene, Cycloolefin, and Aromatic Hydrocarbons, J. Phys. Chem., 70, 1267-1275 (1966).

21. Fühner, H., Die Wasserlöslichkeit in Homologen Reihen, Chem. Ber., 57, 510-515 (1924).

22. Sanemasa, I., Araki, M., Deguchi, T., and Nagai, H., Solubility Measurements of Benzene and The Alkylbenzenes in Water by Making Use of Solute Vapor, Bull. Chem. Soc. Jpn., 55, 1054-1062 (1982).

23. Polak, J. and Lu, B. C.-Y., Mutual Solubilities of Hydrocarbons and Water at 0 and 25C, Can. J. Chem., 51, 4018-4023 (1973).

24. Nelson, H.D. and DeLigny, C.L., The Determination of the Solubilities of Some n-Alkanes in Water at Different Temperatures, by means of Gas Chromatography, Recueil, 87, 528-544 (1968).

25. Miller, M.M., Ghodbane, S., Wasik, S.P., Tewari, Y.B., and Martire, D.E., Aqueous Solubilities, Octanol/Water Partition Coefficients, and Entropies of Melting of Chlorinated Benzenes and Biphenyls, J. Chem. Eng. Data, 29, 184-190 (1984).

26. Dickhut, R. M., Andren, A. W., and Armstrong, D. E., Aqueous Solubilities of Six Polychlorinated Biphenyl Congeners at Four Temperatures, Environ. Sci. Technol., 20, 807-810 (1986).

27. Burkhard, L.P., Armstrong, D.E., and Andren, A.W., Henry's Law Constants for the Polychlorinated Biphenyls, Environ. Sci. Technol., 19, 590-596 (1985).

28. Haque, R. and Schmedding, D., A Method of Measuring the Water Solubility of Hydrophobic Chemicals: Solubility of Five Polychlorinated Biphenyls, Bull. Environ. Contam. & Toxicol., 14, 13-18 (1975).

29. Weil, V. L., Duré, G., and Quentin, K.-E., Wasserlöslichkeit von Insektiziden Chlorierten Kohlenwasserstoffen und Polychlorierten Biphenylen im Hinblick auf eine Gewässer-belaslung mit diesen Stoffen, Wasser und Abwasser-Forschung, 6, 169-175 (1974).

30. Chiou, C.T., Freed, V.H., Schmedding, D.W., and Kohnert, R.L., Partition Coefficient and Bioaccumulation of Selected Organic Chemicals, Environ. Sci. Technol., 11, 475-478 (1977).

31. Dexter, R.N. and Pavlou, S.P., Mass Solubility and Aqueous Activity Coefficients of Stable Organic Chemicals in the Marine Environment: Polychlorinated Biphenyls, Mar. Chem., 6, 41-53 (1978).

32. Bohon, R.L. and Claussen, W.F., The Solubility of Aromatic Hydrocarbons in Water, J. Amer. Chem. Soc., 73, 1571-1578 (1951).

33. Sutton, C. and Calder, J.A., Solubility of Alkylbenzenes in Distilled Water and Seawater at 25.0C, J. Chem. Eng. Data, 20, 320-322 (1975).

34. Brust, H. F., "A Summary of Chemical and Physical Properties of Dursban", Down To Earth, 22, 21-22(1966).

35. Felsot, A. and Dahm, P., "Sorption of Organophosphorus Carbamate Insecticides by Soil", J. Agric. Food Chem., 27, 557-563(1979).

36. "The Pesticide Manual", 4th edition, Edited by Martin, H.,The British Crop Protection Council, 144-150 London Road, Croydon CR0 2TD, England, 1977.

37. Table A-2, page 220 in Khan, S. U., "Pesticides in the Soil Environment", Volume 5 in the series "Fundamental Aspects of Pollution Control and Environmental Science, Elsevier, Amsterdam, 1980.

38. "Farm Chemicals Handbook", Meister Publishing Co., 37841 Euclid Avenue, Willoughby, OH 44094, USA 1988 (216-942-2000).

39. Suntio, L. R., Shiu, W. Y., Mackay, D., Seiber, J. N., and Glotfelty, D., "A Critical Review of Henry's Law Constants for Pesticides", Rev. Envirn. Contam. Tox, 103, 1-59(1988).

40. Bowman, B. T., and Sans, W. W., "Further Water Solubility Determinations of Insecticidal Compounds", J. Envirn. Sci. Health, B18(2), pages 221-227 (1983).

41. Mobay Corporation, Agricultural Chemicals Division, Report #94648, "Water Solubility of Fenthion Pure Active Ingredient", 1987.

42. Weed Science Society of America, "Herbicide Handbook", 5th edition, 1983. This handbook is available from the Weed Science Society of America (WASA), 309 W. Clark Street, Champaign, IL 61820 USA (217-356-3182).

43. Spillner, C. J., Thomas, V. M., Takahashi, D. G., and Scher, H. B., "A Comparative Study of the Relationships Between the Mobility of Alachlor, Butylate, and Metolachlor in Soil and Their Physicochemical Properties", Chapter 12 in ACS Symposium Series #225, "Fate of Chemicals in the Environment", ACS Washington, DC 1983.

44. Y. Eshel, "Phytoxicity, Leachability, and Site of Uptake of 2-chloro-2',6'-diethyl-N-(methoxymethyl) Acetanilide", Weed Sci., 17, 441-444(1969).

45. Ciba Geigy, "Toxicology Data", Technical Report 1977, page 1, Department of Industrial Medicine, Agricultural Division, Ardsley, NY 10502.

46. Beilstein, P., Cook, A. M., and Hutter, R., "Determination of Seventeen s-Triazine Herbicides and Derivatives by HPLC", J. Agric. Food. Chem., 29, 1132-1135(1981).

47. Calvet, R., Terce, M., and Le Renard, J., "Kinetics of Dissolution of Atrazine, Propazine, and Simazine in Water", Weed Res., 15, 387-392(1975).

48. Hartley, G. S., and Graham-Byrce, I. J., "Physical Principles of Pesticide Behavior", Volume 4, Appendix 4, Academic Press(1980).

49. Bartley, C. E. "Triazine Compounds", Fm. Chem., 122, 28-34 (1959).

50. Melnikov, N. N., "Chemistry of Pesticides, Residue Reviews, Volume 36, Springer-Verlag, New York (1971).

51. Internal data submitted to the states of Arizona and California, Ciba-Geigy Corporation, Agrochemical Division, Animal Health and State Regulatory Affairs, PO Box 18300, Greensboro, NC 27419.

52. Hormann, W. D., and Eberle, D. O., "The Aqueous Solubility of 2-chloro-4-ethylamino-6-isopropyl-1,3,5-triazine (Atrazine) Obtained by an Improved Analytical Method", Weed Res., 12, 199-202 (1972).

53. Hurle, K. B. and Freed, V. H., "Effect of Electrolytes on the Solubility of some 1,3,5-triazines and Substituted Ureas and their Adsorption on Soil", Weed Res., 12, 1-10(1972).

54. Verschueren, K., "Handbook of Environmental Data on Organic Chemicals", page 231, 2nd Edition, Van Nostrand Reinhold, New York, 1983.

55. Ward, T. M., and Weber, J. B., "Aqueous Solubility of Alkylamino-s-Triazines as a Function of pH and Molecular Structure", J. Agric. Food Chem., 16, 959-961(1968).

56. Getzen, F. W. and Ward, T. M., "Influence of Water Structure on Aqueous Solubility", Ind. Eng. Chem. Prod. Res. Develop., 10, 122-132(1971).

57. M. M. Schantz and F. Guenther, private communication. The chemicals used were of purity of 99% or higher. The chemicals were not ionizable. HPLC grade, organic free, water was used as the solvent. The solubility experiments were done over several days, in a controlled temperature bath. At 25 ^oC there were 12 different measurements made and the reported value is the averaged measurement. At the other temperatures there were at least six different measurements made and then averaged to produce the one reported. All the data were plotted for the parameters solubility vs. temperature as shown in Figures 1e and 1f.