Analytical Chemistry Resources on the Internet



Stephen R. Heller
USDA, ARS, Beltsville, MD 20705-2350 USA
SRHELLER@ASRR.ARSUSDA.GOV


INTRODUCTION

Never has a technical subject been so popularized so quickly by the press, and adopted by the public, as the information highway of the future - the Internet (1). With a growth rate of some 10-15% per month, trying to write about the resources available to the analytical chemistry community, is like mud wrestling - a real mess. While there are many changes taking place in the use of computers in chemistry, particularly in how chemists work, obtain information and publish (2), this presentation will emphasize only one - the Internet.

Once upon a time, some two decades ago, the US Department of Defense's (DOD) Advanced Research Projects Agency (ARPA) created a computer network to connect DOD researchers around the USA. From this limited and humble beginning, the Internet has developed. It is now a nebulous collection of 10,000+ computers hooked together connecting 15 million (plus another 10,000 by the time you finish reading this article) academic, industrial, and government users of computers throughout most of the world (1a). And yes, even Siberia is on the network! The Internet is a resource which would have been science fiction just a few years ago, but today is a reality for sharing documents, data, databases, information, software, ideas. If you thought dealing with your local government or any large bureaucracy was difficult, wait to go into the cyberspace of the Internet.

The purpose of this article is both to whet your appetite for what is available on the Internet, as well as to provide some very basic information and suggestions as to how to access the Internet, locate and retrieve this information. A complete list of resources is both impossible (as the Internet resources seem to grow faster than a cancer cell multiplies and change faster than the AIDS virus mutates ) as well as too overpowering for anyone to handle, or for TrAC to publish. The goal of this article is to provide enough starting points for one to go off into the Internet (called cyberspace) to explore for yourself. For further information about the Internet the recent book by Ed Krol is an very good starting point (3).



INTERNET RESOURCES

Most universities and government offices are on the Internet, as well as a growing number of commercial firms (4). The resources on the Internet are not all free (actually none are free - someone has to pay for all the connections, the hardware and software to interface to the other computers), but for all practical purposes, for better and/or worse, they appear free. Certainly the bulk of what I will describe here is "effectively" free. Very valuable commercial database resources for analytical chemists, such as Chemical Abstracts's STN network and DIALOG, both available on the Internet, require paid accounts to access these computer systems. Two important points to make at this time is that the resources on the Internet come and go, as local decisions as to what is available changes like the tides. Secondly, and most importantly, there is no accepted quality control procedures for Internet resources. Some are good, some are not. Often there are two versions of the same information, so be sure you locate the most recent source. Some sources are copyright and should not be accessible or accessed, but they are, so use proper judgement.



Before going any further please look at Table 1 where a few critical terms are defined. As electronic information is used more widely as we go further into the electronic information age, there is now a book on how to properly cite such information (5).

Table 1


Condensed Internet Glossary

ARCHIE: A database, originally developed at McGill University in Canada, which keep tracks of the millions of files on the over 900 ftp'able computers on the Internet.

E-Mail: Allows one to electronically send and receive messages via computer.

FTP: Is the file transfer protocol by which one can electronically request or send a file (of data or a software program) from a remote computer to be transferred or sent to another computer. Normally one uses the login name "anonymous" and your Internet address (e.g., srheller@asrr.arsusda.gov") as the password to get into the computer from which you to get a file.

Gopher: A University of Minnesota developed software package to allow someone to "go-for" files and readily search information physically located on other computers on the Internet. (Actually the name comes the university mascot.)

Internet: A collection of computers located throughout the world which are connected together and allows for four major protocols: electronic mail (e-mail), bulletin boards/discussion groups, file transfers between machines (ftp), and remote logins from any machine on the network (telnet)

USENET: A group of computer systems that exchange news. Most of the USENET bulletins boards are of a very general nature and there over 3500 of these news groups.

VERONICA: Named for ARCHIE's girlfriend. Veronica, developed at the University of Nevada, helps to access gopher directories and the files they contain.

WAIS: The Wide Area Information Server is a tool to quickly and easily retrieve database information from computers on the Internet. It is likely to be replaced by a similar but more powerful search tool the WWW (described next).

WWW: Similar to the WAIS but different to the extent that the World Wide Web is a Swiss developed software search system allowing one to search databases in a "hypertext" (where data is directly linked to other data) fashion.

INTERNET ADDRESSES

Internet addresses are usually given a name, which is an alias for the real numeric address. In as many cases as possible I will try to give both, as some computer system software requires the numeric address, while others are more tolerant of simple names.

ARCHIE (a shortened version of "ARCHIvE") and is a continuously updated database of about 900 FTP'able sites for innumerable software items and files. There are a number of identical database ARCHIE sites around the world:

quiche.cs.mcgill.ca 132.206.2.3 (The original ARCHIE)
archie.sura.net 128.167.254.179
archie.rutgers.edu 128.6.18.15
archie.ans.edu 147.225.1.2
archie.funet.fi 128.214.6.100 (Europe)
archie.doc.ic.ac.uk 146.169.3.7 (UK)
archie.au 128.184.1.4 (Australia)

To access any of these systems, login as "ARCHIE", and no password is required. The two most useful commands are WHATIS (e.g., whatis chemistry) and PROG (e.g. prog ampac). WHATIS locates software by subject, and PROG locates software by ftp site. You will need to use PROG to find one (usually of many) possible ftp addresses.

EXAMPLE OF INTERNET ACCESS VIA ftp

There are a number of chemistry resources on the Internet, and the best way to find them is from lists of such resources collected by someone and put on the network. One source of chemistry related Internet sites can be found on the computer host machine in Greece called leon.nrcps.ariadne-t.gr (or 143.233.2.1). On this machine there is a file called "/pub/chemistry/sites.chem" which lists locations of chemistry related resources. On the same computer is a list of chemistry related mailing lists, bulletin boards, and new groups which are found in a file called "/pub/chemistry/m_lists.chem".

To obtain these files (some of which are simple text or ascii and some of which are binary) one can simply ftp them. The following example shows how I was able to ftp (connect) to the computer (in this case a computer with a UNIX operating system) which has the file called "sites.chem", transfer that file back to my computer's disk area (a VAX computer with a VMS operating in my case) and then type out the first few records of this file showing a number of sites where information can be obtained. What I typed is shown in bold, the remainder is the computer response. (There is a good deal of verbiage, which is typical of such activity.) Basically all that one does is ftp to the computer you want to access, login in as "anonymous", go to the directory/sub-directory in which the file is that you want, use the command "get" to get or transfer the file to your computer (which took 58.11 seconds to get a file of 53633 bytes from the computer in Greece), and then once you logoff the remote computer, type the first few lines of the file you just moved to your computer. The last part of the output in Figure 1 is lists of the first seven sources of the names of the files from machines in England (.uk), Germany (.de), Japan (.jp), Germany (.de), Texas (utexas.edu), Australia(.au), and Michigan (umich.edu). Hopefully the names provided by the authors of the files are descriptive enough to let you know if you are interested in actually obtaining copies of them. It should be noted that different computer systems (VAX, Sun, HP, etc.) often have different commands, so blindly repeating what is shown below at another ftp site may not work, in part or in whole.

Figure 1

ftp leon.nrcps.ariadne-t.gr
asrr.arsusda.gov Wollongong FTP User Process (Version 5.1)
Connection Opened
Using 8-bit bytes.
220 leon FTP server (SunOS 4.1) ready.
Name (leon.nrcps.ariad)ne-t.gr:srheller): anonymous
331 Guest login ok, send ident as password.
Password: (srheller@asrr.arsusda.gov)
230 Guest login ok, access restrictions apply.
*cd /pub/chemistry
250 CWD command successful.
*dir
200 PORT command successful.
150 ASCII data connection for /bin/ls (192.94.164.2,3900) (0 bytes).
total 6
drwxr-xr-x 2 root 1 512 Apr 30 1993 coll_chem.doc
drwxr-xr-x 2 root 1 512 Jul 19 08:02 ftp.sites.chem
drwxr-xr-x 2 root 1 512 Jul 19 08:01 m_lists.chem
-r--r--r-- 1 root 1 1332 Jul 19 07:59 readme.07-93
drwxr-xr-x 2 root 1 512 Jul 19 08:03 related_fields
226 ASCII Transfer complete.
348 bytes transfered in 1.50 seconds (0.23 Kbytes/second)
*cd ftp.sites.chem
250 CWD command successful.
*dir
200 PORT command successful.
150 ASCII data connection for /bin/ls (192.94.164.2,3923) (0 bytes). total 68
-r--r--r-- 1 root 1 14877 Jul 19 08:02 prgms-dtbs.07-93
-r--r--r-- 1 root 1 53633 Jul 19 08:02 sites.07-93
226 ASCII Transfer complete.
146 bytes transfered in 0.62 seconds (0.23 Kbytes/second)
*get sites.07-93
200 PORT command successful.
150 Binary data connection for sites.07-93 (192.94.164.2,3925) (53633 bytes).
226 Binary Transfer complete.
53633 bytes transfered in 58.11 seconds (0.90 Kbytes/second)
*bye
221 Goodbye.
$ dir
Directory USER$DISK:[SRHELLER]
SITES.07-93;1
$ type sites.07-93;1
Directory: /pub/chemistry/ftp.sites.chem, file sites.07-93
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

*Note* Refer to file prgms-dtbs.07-93 at directory
~~~~~~ /pub/chemistry/ftp.sites.chem, from this host.

----------------------------------

achilles.doc.ic.ac.uk (146.169.2.37)
/tex/aston/digests/texline/no12 file chemist.tex
/tex/aston/digests/texline/no6 file chemistry.tex
/tex/aston/latex/contrib/chemtex
aix370.rrz.uni-koeln.de (134.95.132.2)
/chemie
/msdos/mswindows3/demos file chemwin.zip
/tex file ChemTeX.sh.Z


akiu.gw.tohoku.ac.jp (130.34.8.9)
/pub/mac/graphics file mac-molecule-15.hqx
/pub/mac/graphics file mac-molecule-17.hqx
/pub/mac/graphics file mac-molecule-images.hqx
/pub/mac/graphics/quicktime file macmolecule-movie.hqx
/pub/mac/graphics/quicktime file molecule-movie.hqx


alice.fmi.uni-passau.de (132.231.1.180)
/pub/TeX/dhdurz1 file chemstrt.zoo

andy.che.utexas.edu (128.83.162.5)
/pub/gnu/gawk-2.13.2/test/chem

archie.au (139.130.4.6)
/graphics/gif/m file molecule
/micros/mac/info-mac/app file mac-molecule-17.hqx
/micros/mac/info-mac/app file mac-molecule-images.hqx
/micros/mac/info-mac/art file 3d-molecule.hqx
/micros/mac/info-mac/art/qt file macmolecule-movie.hqx
/micros/mac/info-mac/Old/app file mac-molecule-15.hqx
/micros/mac/info-mac/Old/app file mac-molecule-part1.hqx.Z
/micros/mac/info-mac/Old/app file mac-molecule-part2.hqx.Z
/micros/mac/info-mac/Old/art/qt file molecule-movie.hqx
/micros/mac/info-mac/Old/card file chemical-inventory-201.hqx.Z
/micros/mac/info-mac/Old/card file chemical-inventory-21.hqx
/micros/pc/oak/education file chemical.arc
/micros/pc/simtel-20/education file chemical.arc
/usenet/comp.sources.misc/volume2 file molecule.Z
/usenet/comp.sources.unix/volume24/chemtab

archive.umich.edu
(141.211.32.2)
/mac/hypercard/science
/mac/misc/biology
/mac/misc/chemistry file acidsandbasespartone.sit.hqx
/mac/misc/chemistry file acidsandbasesparttwo.sit.hqx
/mac/misc/chemistry file aminoacid.sit.hqx
/mac/misc/chemistry file atomicstructurepartone.sit.hqx
/mac/misc/chemistry file atoms.sit.hqx
/mac/misc/chemistry file ballandstick3.04demo.cpt.hqx
/mac/misc/chemistry file bonding.sit.hqx
/mac/misc/chemistry file chem101.sit.hqx
/mac/misc/chemistry file chemequilibrium.sit.hqx
/mac/misc/chemistry file chemicalkinetics.sit.hqx
/mac/misc/chemistry file chemistrychapter1.sit.hqx
/mac/misc/chemistry file chemquiz.sit.hqx
/mac/misc/chemistry file chemriddles.sit.hqx
/mac/misc/chemistry file crystaltutordemo.cpt.hqx
/mac/misc/chemistry file elements.sit.hqx
/mac/misc/chemistry file elementtwo.sit.hqx
/mac/misc/chemistry file gas.sit.hqx
/mac/misc/chemistry file gasspectra.cpt.hqx
/mac/misc/chemistry file heatofvap.sit.hqx

(The list continues but I stopped the printing at this point.)

Another source of information, which is likely to increase in the future, is ftp sites from publishers. One such system, which is that of Springer-Verlag. Their machine, trick.ntp.springer.de., has a variety of information, ranging from the table of contents of their journals, to demonstration versions of programs, such as the MOBY molecular modeling system. Just recently ISI has provided access to The Scientist, via ftp, at ds.intercis.net in the pub/the-scientist directory. No doubt other publishers will soon begin to create similar resources.



BULLETIN BOARDS AND LIST SERVERS

The best way to find out more about what is available is to join one (or more) of the many bulletin boards, discussion groups, or list servers, which are available on Internet. A sample of these are listed, in alphabetic order, below in Table 2 (6). To gain access (and often 10-20+ e-mail messages a day, giving rise to a new meaning to the phrase "junk-mail") just send a e-mail to the Internet address for the list and as the subject heading and.or content of the mail message type "subscribe". (If you find the discussion group verbiage to much or not of interest, the reverse command is "unsubscribe" for either a period of time (while you are on vacation) or forever.)

Table 2


Examples of Chemistry List-servers

ACS COMP Newsletter

The Newsletter of ACS Computer Division (COMP) is now available in electronic form from the Computational Chemistry List archives. To retrieve it, ftp to kekule.osc.edu and look in the pub/chemistry/comp_news directory or mail the command send ./comp_news/vol17.txt from chemistry to oscpost@osc.edu.

Buckminster Fullerene mailer.

To subscribe, send the message: INTRO or HELP to: bucky@sol1.lrsm.upenn.edu. To get the bibliography, send the keyword BIBLIO to the same address. The service is maintained by Jack Fischer's group at the University of Pennsylvania.

Buckyball database.

This fullerene database is accessible via anonymous ftp at: physics.arizona.edu. login: ftp passwd:your e-mail address. It is found in: /usr/ftp/asc. The site has a program called PCBIB that allows searches of the database by keywords. The resource is based on materials from Professor Richard E. Smalley's BuckyBall Bibliography. One can telnet and search for this information at sabio1.library.arizona.edu. Login is "sabio". Choose "Other databases" from the menu.

Computational Chemistry List.

To subscribe, send the message: send help from chemistry to: OSCPOST@oscsunb.osc.edu

CHEM-COMP, Computational Chemistry.

To subscribe, send the message: join chem-comp firstname lastname to: mailbase@mailbase.ac.uk



CHEMED-L, Chemistry Education Discussion List.

To subscribe, send the message: subscribe chemed-l firstname lastname to: listserv%uwf.bitnet@cunyvm.cuny.edu


CHMINF-L, the Chemical Information Sources Discussion List.

Indiana University. CHMINF-L may be joined by sending the message: subscribe chminf-l to: listserv@iubvm.ucs.indiana.edu CHMINF-L covers all information sources that can be used to answer questions a chemist might have.



CMTS-L, the Chemical Management and Tracking Systems List.

CMTS-L serves as a forum for the exchange of ideas on the establishment of computerized systems to manage chemical inventories. To subscribe, send the message: subscribe CMTS-L firstname lastname to: listserv@cornell.edu



CORROS-L, The Corrosion Interest List.

To subscribe, send the message: subscribe corros-l firstname lastname to: listralv@ib.rl.ac.uk



FORENS-L, Forensic Sciences Discussion Group.

To subscribe, send the message: subscribe firstname lastname to: FORENS-REQUEST@ACC.FAU.EDU



HIRIS-L, High Resolution IR Spectroscopy List.

Send the message SUB HIRIS-L firstname lastname

to: listserv@iveuncc.


ICS-L, International Chemometrics Society.

To subscribe, send the message: subscribe ICS-L

to: listserv@umdd.umd.edu



JACS Supplementary Material. ; January 1993-.

JACS supplementary material is available as TIFF images, or when supplied by the author, as ASCII files. Retrieval can be done by ftp, electronic mail, or direct connection to the ACS information server. This is not a free service.



Microscale List-server.

List-server for exchange of information on microscale materials. To subscribe, send the message: subscribe microscale-l@merrimack.edu



Molecular Modeling lists:
dibug-request@comp.bioz.unibas.ch (BIOSYM)
charmm-bbs-sysop@emperor.harvard.edu (CHARMM)
amber-request@cgl.ucsf.edu (Amber)
sybylreq@quant.chem.rpi.edu (Sybyl)
hyperchem-request@autodesk.com (Hyperchem)



MOSSBA-L, Mossbauer Spectroscopy, Software & Forum.

Send the message: subscribe mossba-l firstname lastname
to: listserv@usachvm1

REACTIVE: A discussion list about air sampling and monitoring of short-lived pollutants. Send the message: subscribe reactive firstname lastname to: listserv@vm1mcgill.ca

SAFETY.

The laboratory safety list can be joined by sending the message: subscribe safety to: listserv@uvmvm.uvm.edu

Springer-Verlag Journals Preview Service.

Tables of contents, titles (article heads), and summaries (abstracts) of papers from 30 Springer journals in the life sciences and radiology are available three to six weeks before the appearance of the printed version. There is a modest annual fee for the abstract information, but the other data can be had at no cost. To subscribe, send the message: help to: svjps@dhdspri6



SVSERV, Springer-Verlag demonstration files.

The server has new books published by Springer-Verlag, demonstrations of their software and databases, tables of contents, etc. Send the command: subscribe inf to: svserv@vax.ntp.springer.de



USENET News Groups: sci.chem; sci.chem.organomet; sci.engr.chem; sci.polymers. The USENET News Groups allow easy access to a variety of topics. Check with your local computer service to see how to access these services.

SUMMARY

This quick tour of the Internet has provided a limited snapshot of what is available. Only by taking the time to explore the vast number of storehouses of information connected to the network will you be able to make take advantage of the Internet and provide a useful resource for your everyday activities. But one thing is certain, this is a resource which you will not be able to neglect, as electronic information and data become part of the daily life of the chemist.


References

1. For example, see a) The New York, page 1, November 3, 1993; b) The Wall Street Journal, page B1, September 3, 1993; c) The New York Times, page C1, May 18, 1993; and d) The Washington Post, Business Section, page 5, May 17,1993;

2. S. R. Heller, "Chemical Information Activities: What the Future Holds" , J. Chem. Inf. Comput. Sci., 33, 284-291(1993).

3. THE WHOLE INTERNET - User's Guide & Catalog by Ed Krol, O'Reilly & Associates, 103 Morris Street, Suite A, Sebastopo, CA 95472 (Phone: 800-998-9938; FAX: 707-829-0104. The cost of this book is about $25. This company also has a number of other items about the Internet and related subjects. For details write to the company or get their latest product announcements by asking to "subscribe ora-news first name lastname organization name" and e-mail this to listproc@online.ora.com.

4. For those who do not have access to Internet in their organization, a number of commercial companies are now providing Internet access for a fee One such service is DELPHI, which has over 600 local dial up access points in the USA. You can get a free trial of this service by instructing your modem to dial 1-800-365-4636. Press the carriage return (CR) once or twice and use "jiondelphi" as the user login name and "cpt31x" as the password. For additional information call 1-800-695-4005.

A similar service is being developed in Europe by EUnet Limited. They can be contacted by phone at 31-20-592-5109 (FAX: 592-5155) or by e-mail at info.eu.net

5. "Electronic Style: A Guide to Citing Electronic Information", Meckler Publishing, 11 Ferry Lane West, Westport, CT 06880. The price of this 65 page book is $15.

6. Much of this information was complied by Dr. Gary Wiggins (WIGGINS@UCS.INDIANA.EDU) at the University of Indiana, who runs the chemical information list-server. This list (of over 1400 subscribers) is an excellent place to ask for help on finding chemical information or chemical data sources.