Notes
Slide Show
Outline
1
A Snapshot Overview of Large Chemical Structure Databases

Stephen R. Heller

steve@hellers.com
2
The slides from this presentation can be found at :

http://www.hellers.com/steve/pub-talks/
(Goslar – November 2007 link)
3
This will be an atypical talk.

 I have trouble being normal.
4
Disclaimer

The opinions presented on these slides are those of the slides and not necessarily those of the speaker.

No animals were harmed in the preparation of this talk; however a quite a few WWW sites were hit.
 
These slides were made from 100% recycled electrons.
 
This will be a well balanced presentation. I have a chip on both shoulders.
5
Outline of Presentation
    • 1.  Background/History
  •      2.  Description/Contents of the Databases
  •      3.  Features of the Databases
  •      4.  InChI/InChIKey


6

With apologies to Shakespeare-

“It is a tale
Told by an idiot,
full of sound and fury,
Signifying nothing”


Macbeth - Act 5, Scene 5. Dunsinane. Within the castle.
7
Background
  •    This lecture will cover a number of large (arbitrarily defined as greater than 1 million structures) chemical structure databases currently available, or soon to be available, on the Internet. Just a few years ago there were only two very large, but rather different databases of organic chemicals available - Beilstein and Chemical Abstracts. There was also one large database of chemical structures associated with chemical reactions (SPRESI).  Within the past 1-2 years the situation has dramatically changed with some 20 large structure databases of all sorts becoming available, some commercial and some no-fee/open data.
8
In this presentation the emphasis will be on the content, or lack of content, of these databases including the chemical structures and related/linked information. The one central theme for most all these projects and databases is that they are aimed at scientists in the pharmaceutical/drug/biotechnology communities.
9
One important point to note is that this presentation is directed more often by the notion of quantity, not quality. Thus, a relatively small database such as the Protein Database (PDB), while of great important to the  pharmaceutical, drug/biotechnology community will not be discussed here.
10
The 23 Chemical Structure Databases
  •  Ambinter – 5.5 million
  • BioRad - 1 million
  • Chem DB UC/ Irvine - 5 million
  • Chemical Abstracts - 31 million
  • Chemisches Zentralblatt –1.5 million
  • ChemNavigator - 25 million
  • ChemSpider - 17 million
  • Crossfire Beilstein  -10 million
  • Derwent Chemistry Resource – 1  million
  • DiscoveryGate - 20 million
  • eMolecules - 7 million
11
The 23 Chemical Structure Databases
  • Generated Database (GDB) Berne – 26 million
  • GVK BIO – 1.4 million
  • IBM Patent Database -  4 million
  • Index Chemicus - 2.8 million
  • NCI - 30 million
  • QueryChem – Harvard - 10  million
  • PubChem - 11 million
  • Ryan Scientific – 1.9 million
  • SPRESI - 6 million
  • SureChem Patents – 10 million
  • Thomson Pharma – 2.4 million
  • ZINC – UCSF - 5  million


12
 There is much less here than meets the eye.
13
Ambinter
  • Supplier of advanced chemicals – worldwide


  • Paris, France


  • http://www.ambinter.com/


  •  From their website:
  •    You can receive our last CDROM (contact us to request a CD – a subset of the main database). The main database has to be search by structure/substructure/similarity online - use the Search from the home page.
14
Ambinter
  • Links:


  • None
15
Bio-Rad Informatics
  • KnowItAll® U(niversity) is a unique spectroscopy resource for research and teaching. KnowItAll U puts the largest single collection of spectra (over 1.3 million IR, NMR, MS, Raman, UV-Vis, and Near IR) at the fingertips of every student, faculty, and staff member in your institution—at any computer, campus-wide. In addition, KnowItAll U offers award-winning chemistry, spectroscopy, and chemometrics software.
16
Bio-Rad Informatics
  • Links:


  • Chemical Names
  • SMILES
17
ChemDB
  •     ChemDB is a chemical database containing some 5 million commercially available small molecules, important for use as synthetic building blocks, probes in systems biology and as leads for the discovery of drugs and other useful compounds. The data is publicly available over the web for download and for targeted searches using a variety of powerful methods. The chemical data includes predicted or experimentally determined physicochemical properties, such as 3D structure, melting temperature and solubility.. A text-based search engine allows efficient searching of compounds based on over 65 million annotations from over 150 vendors  Built in reaction models enable searches through virtual chemical space, consisting of hypothetical products readily synthesizable from the building blocks in ChemDB.


  • Availability: ChemDB and Supplementary Materials are available at http://cdb.ics.uci.edu


  • Contact: pfbaldi@ics.uci.edu
18
ChemDB
  • URL: http://cdb.ics.uci.edu/CHEM/Web/


  • Sources of chemicals for the database:
  • http://cdb.ics.uci.edu/CHEM/Web/cgibin/supplement/Implementation.py#source


  • June 2007 manuscript:
  • http://bioinformatics.oxfordjournals.org/cgi/reprint/23/17/2348?ijkey=swjzipsmJeGWWzS&keytype=ref


19
ChemDB
  • Links:


  • CAS RN (incomplete)
  • Chemical Names
  • SMILES
  • InChI
20
Chemical Abstracts
  • URL: http://www.cas.org


  •    For the past 100 years CAS has indexed and summarized chemistry‑related articles from about  9,500 journals ( a number which has been decreasing over the years),  as well as patents, conference proceedings, books,  and other documents pertinent to chemistry, life sciences and related areas.  Since 1907 the database contains over 30 million abstracts and 32 million chemical structures.


21
Chemical Abstracts
  • Links:


  • CAS RN’s
  • Chemical Names (the most numerous of all databases – both index, common and trivial names)
22
Chemisches Zentralblatt
  •    URL: None yet


  •    Chemisches Zentralblatt began its life as Pharmaceutisches Centralblatt in 1830. Between 1830 and 1897 it underwent a number of changes in its title and publisher when it was renamed for the final time as Chemisches Zentralblatt.  As a result of WWII it stopped publishing in 1945 but resumed a few years later.  However it never recovered its pre-war status and finally was terminated in1969 .


  •     It is currently be digitized and structures are being extracted from the names in the ~ 1.8 million abstracts.
23
Chemisches Zentralblatt
  • Links:


  • Chemical Names
24
ChemNavigator
  •    URL: www.chemnavigator.com


  •    The iResearch Library, created and assembled by ChemNavigator,  is ChemNavigator's up‑to‑date compilation of commercially accessible screening compounds from international chemistry suppliers. The database currently tracks over 40 million chemical samples from some 270 suppliers. The database contains some 21 million unique structures.
25
ChemNavigator

  • Links:


  • CAS RN’s (incomplete)
  • Chemical Names (incomplete)
  • InChI
  • SMILES
26
ChemSpider
  • URL: http://www.chemspider.com/


  •     ChemSpider is a chemistry search engine. It has been built with the intention of aggregating and indexing chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. ChemSpider is a value-added offering since many properties have been added to each of the chemical structures within the database – structure identifiers such as SMILES, InChI, IUPAC and Index Names as well as many physicochemical properties. We intend ChemSpider to offer the fastest chemical structure searches available online and delivered with the flexibility and usability necessary to encourage repeat usage.
27
ChemSpider

  • Links:


  • Chemical Names
  • InChI/InChIKey
  • SMILES





28
ChemSpider
  •      What problems will ChemSpider solve?


  •      There are tens if not hundreds of chemical structure databases and no single way to search across them. There are databases of curated literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data and on and on.


  •      The only way to know whether a specific piece of information is available for a chemical structure is to have simultaneous access to all of these databases. Since many of these databases are for profit there is no way to easily determine the availability of information within these commercial or even in the open access databases. With ChemSpider the intention is to aggregate into a single database all chemical structures available within open access and commercial databases and to provide the necessary pointers from the ChemSpider search engine to the information of interest. This service will allow users to either access the data immediately via open access links or have the information necessary to continue their searches into commercially available systems. The question “is there specific information about my chemical” will be answered. Accessing the information may require a commercial transaction with the appropriate provider.
29
 
DATA SOURCES
Count Data Source

637917       ACD/Labs
2973506 AKos
362469 ASINEX
1207 BIND
8492 BindingDB
1660 BioCyc
10458 CambridgeSoft Corporation
58 CC_PMLSC
7742 ChEBI
413586 ChemBank
107570 ChemBlock
433971 ChemBridge
3564938 ChemDB
156247 ChemExper Chemical Directory
383789 ChemIDplus
1603 Chirals
1307 CiVentiChem
1629 CMLD-BU
68400 CombiUgi
1040 Diabetic Complications Screening
2739765 DiscoveryGate
4496 DrugBank
268696 DTP/NCI
1280 Emory University Molecular Libraries Screening Center
829193 Enamine
14862 EPA DSSTox
1573 FDA
30

3439 Human Metabolome Database
201926 Journal of Heterocyclic Chemistry
16938 KEGG
1869 KUMGM
724 LeadScope
8882 LipidMAPS
26437 Marinlit
10655 MDPI
146 MICAD
127973 MLSMR
59084 MMDB
1951 MOLI
105566 MTDP
1121 Nanogen
994 Nature Chemical Biology
11080 NCGC
138703 NIAID
1040 NINDS Approved Drug Screening Program
177495 NIST
54175 NIST Chemistry WebBook
4282 NMMLSC
16123 NMRShiftDB
10 PANACHE
27 PCMD
2871 PDSP
31
168420 Peptides
3423 Prous Science Drugs of the Future
10909738 PubChem
144 QSAR
10609 R&D Chemicals
1915893 Ryan Scientific
53807 San Diego Center for Chemical Genomics
319 SGCOxCompounds
17 SGCStoCompounds
6591 Single Depositions
7162 SMID
200099 Specs
2 Structural Genomics Consortium
85185206 SureChem (Patents)
1814 SynChem
63697 Synthon-Lab
2260604 Thomson Pharma
16125 TOSLab
910 Total TOSLab Building-Blocks
556118 UkrOrgSynthesis
1010 UM-BBD
2111 UPCMLD
238 UsefulChem
20 Web of Science
2462 xPharm
3813892 ZINC
32
Crossfire Beilstein
  •    The current Crossfire Beilstein database consists of somewhat over 10 million structures and 320 million experimental pieces of data.  The Beilstein database provides chemical data on organic substances and reactions, including structures, properties, bioactivity records, preparation details and specific reaction pathways; also provides citations and some abstracts to the primary organic chemistry literature.  Incorporates ALL of the data from the original Beilstein Handbuch (1771-1984) and from journals abstracted since 1980.
33
Crossfire Beilstein
  • Links:


  • Beilstein numbers
  • CAS RN’s (incomplete)
  • Lawson numbers
  • Chemical Names
  • InChI/InChIKey
34
Derwent
  •    Derwent Chemistry Resource (Dialog File 355) lets you find specific chemical compounds within Derwent World Patents Index (WPI) records. Unique numbers identify specific chemical compounds and form the link between Derwent Chemistry Resource and the corresponding bibliographic indexing in Derwent WPI.
35
Derwent

  • Links:


  • Chemical Names
36
DiscoveryGate
  • URL www.discoverygate.com


  •     DiscoveryGate® from Symyx Technologies  is a collection of a number of databases (including the Crossfire Beilstein database)  designed for scientific information and answers to pharma/drug discovery questions. A web‑based discovery environment, DiscoveryGate integrates, indexes, and links scientific information to give the user immediate access to compounds and related data, reactions, original journal articles and patents, and authoritative reference works on synthetic methodologies - all from a single entry point.


37
DiscoveryGate
  • Links:


  • CAS RN (incomplete)
  • Chemical Names
  • InChI/InChIkey
38
eMolecules
  • URL:  www.emolecules.com/


  •     eMolecules ® describes itself as the leading open-access chemistry search engine. eMolecules' mission is to discover, curate and index all of the public chemical information in the world, and make it available to the public for free. eMolecules comprises primarily of chemical catalogs and other public databases, such as PubChem. They have recently added spectral data from Wiley to their system.
39
eMolecules
  • Links


  • CAS RN’s (incomplete)
  • Chemical Names
  • SMILES


40
GDB - Berne
  • URL: http://dcbwww.unibe.ch/groups/reymond/


  •    GDB is a large (26 million structures) database of generated structures, which the Reymond group believes is of value for drug discovery.  The Reymond group has  taken such a first look by constructing a database of all molecules up to 11 atoms under constraints that define chemical stability and synthetic feasibility. The database contains 26.4 million compounds, the vast majority of which have never been synthesized.
41
GDB - Berne

  • Links:


  • SMILES



42
GVK BIO

  • Links:


  • Chemical Names
  • SMILES
43
GVK BIO
  • URL: http://www.gvkbio.com/informatics.html


  • These databases are developed based on journal and patent information. The information contains both chemical as well as biological space pertaining to the reported molecules.


  • These databases contain information on pharmacokinetics, toxicity and clinical-relationship from various journal articles, patents, reviews, clinical trials and all other possible sources, both public and private in nature, updated periodically.
44
IBM Patents
  •    Steve Boyer of IBM has taken a copy of the computer readable version of the US Patent databases and extracted over 7 million chemical names which he converted into a searchable structure file.  Concept terms have also been tagged and links to the NLM PubMed database have been made.
45
IBM Patents

  • Links:


  • Chemical Names
  • InChI/InChIKey
  • SMILES
46
Index Chemicus
  •     Index Chemicus is a text‑ and substructure searchable database of the structures from the Thomson Web of Science (WOS)  reports, and adds over 200,000 new compounds each year, with a total coverage of over 2.8  million unique structures published in the literature since the early 1990’s. It covers the world's leading organic chemistry journals, Index Chemicus offers full graphical summaries, important reaction diagrams, complete bibliographic information, and author abstracts.
47
Index Chemicus

  • Links:


  • Chemical Names
48
NCI - Chemical Structure Lookup Service (CSLS )
  •     (CSLS) is a new web‑based system for locating chemical structures in over 70 different public and commercial data sources. The CSLS system stores information on over 30 million chemical structures and provides a simple search interface for looking up chemicals by specific structure as well as by parent structure, and by various identifiers.


  •     The goal in creating CSLS was to provide one publicly accessible system that cross‑references multiple cheminformatics data sources based on chemical structure. Scientists can use this system to find what information is available about a specific chemical structure or a list of structures by quickly identifying databases in which these structures occurs. The links are, in general, not direct links, as most of the databases are fee-based and not directly available.
49
NCI - Chemical Structure Lookup Service (CSLS )

  • Links:


  • CAS RN’s (incomplete)
  • Chemical Names
  • InChI/InChIKey
  • SMILES
50
PubChem
  • URL: pubchem.ncbi.nlm.nih.gov/


  •    PubChem is a DEPOSITION system that provides information on the biological activities of small molecules. It is a component of NIH's Molecular Libraries Roadmap Initiative. The easiest way  to learn more about how to use the PubChem resources is to go to their expanding help page:


  •     http://pubchem.ncbi.nlm.nih.gov/help.html
51
QueryChem

  • Links:


  • Chemical Names
  • SMILES
52
QueryChem
  •    Query Chem (www.QueryChem.com) is a Web program that integrates chemical structure and text-based searching using publicly available chemical databases and Google's Web Application Program Interface (API). QueryChem is just a combination of the database from ChemBank, PubChem., and eMolecules. Query Chem makes it possible to search the Web for information about chemical structures without knowing their common names or identifiers. Furthermore, a structure can be combined with textual query terms to further restrict searches. Query Chem's search results can retrieve many interesting structure-property relationships of biomolecules on the Web.
53
Ryan Scientific


  • Links:


  • Chemical Names
54
Ryan Scientific
  • Ryan Scientific specializes in the sales and marketing of chemicals required by Biotechnology, Pharmaceutical, Agricultural research companies and Universities throughout North America. Our products are primarily focused in Drug and Ag discovery research and are used for both in High Throughput Screening (HTS) and organic synthesis, using combinatorial and structure-based techniques.


  • Their combined catalog of chemicals come from over 100 suppliers:


  • http://www.ryansci.com/adVend.htm
55
SPRESI
  •    SPRESI is a chemical structure and reaction database that includes over 5 million structures, 3.7 million reactions and 28 million factual data entries extracted from 600,000 references and 164,000 patents. It was introduced in 2002 by Infochem It includes Synthesis Tree Search which searches for published synthesis reactions leading to and from the target.  The SPRESIweb data have been abstracted from over 1350 literature sources, mostly journals
56
SPRESI

  • Links:


  • Chemical names
57
SureChem Patents
  • Search more than 10 million  chemical structures
  •   Complete full text collections of US, European and WO/PCT patents
  •   Structures updated within days of new patent issuance
  •   Advanced chemical structure and patent search tools
  •   Export structure and text search results
  •   Powerful result filtering and query navigation tools
58
SureChem
  • Links:


  • Chemical Names
  • InChI/InChI/Key
  • SMILES
59
Thomson Pharma

  • URL: http://www.thomson-pharma.com/


  •   The 2.4 million unique structures in Thomson Pharma contains content from the other Thomson databases, some of which have already been mentioned.


  • Derwent Drug File – 123,000
  • Derwent World Patent Index - 1.05 million
  • ISI Index Chemicus - 2.8 million
  • Current Chemical Reactions – 561,000


  •  BUT  it is not the sum of all the others due to the following:


  • 1. The structures are de-duplicated across the sources


  • 2. Only "pharmaceutically relevant" compounds are included, e.g. only
  • those from section B of DWPI are included (about 2/3).  E.g. from IC,
  • only those from a subset of journals or with biological activity are
  • included (just under half the total).  And  a few extra compounds, e.g. from IDDB,CFT are also included


60
Thomson Pharma
  • Links:


  • Chemical Names
61
ZINC



  • URL: blaster.docking.org/zinc/


  •      ZINC is a free database of commercially‑available compounds for virtual screening. ZINC contains over 4.6 million compounds in ready‑to‑dock, 3D formats. ZINC is provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California, San Francisco (UCSF). There was a descriptive write up on ZINC in C&E news in 2005:


  •          http://pubs.acs.org/cen/news/83/i07/8307notw3.html


  •     Funded by NIH, and with the agreement of numerous chemical supplier companies  ZINC can be used with numerous docking programs. Thus ZINC is effectively a Aready to dock@ database@. Shoichet and Irwin have produced three‑dimensional structures from two‑dimensional information, weeded out insoluble forms, and calculated properties  such as protonation states and number of rotatable bonds .
62
ZINC

  • Links:


  • Chemical Names
  • SMILES
63
Areas of Content
  • Category 1: (Literature and/or Patent Links)


  •  Beilstein
  •  Chemical Abstracts
  •  Chemisches Zentralblatt
  •  Derwent Chemistry Resource
  •  DiscoveryGate
  •  GVK BIO
  •  IBM Patent Database
  •  Index Chemicus
  •  PubChem
  •  SureChem
  •  SPRESI
64
Areas of Content
  • Category 2: (Chemical Catalogs/Information)


  • Ambinter
  • ChemDB
  • ChemSpider
  • ChemNavigator
  • EMolecules
  • GDB - Berne
  • NCI
  • QueryChem
  • Ryan Scientific
65
Area of  Content
  •   Category 3: (Data containing - chemical reactions alone not being considered data)



  •  Beilstein
  •  DiscoveryGate
  •  E-molecules
  •  PubChem
66
Data Elements/Content/Characteristics
  • Free                     Fee                         Download


  • Ambinter                            CAS                                            Ambinter (partial)
  • Chem DB                           Chemische Zentrallblatt             ChemDB
  • ChemNavigator                  CrossFire Beilstein                    GDB - Berne
  • ChemSpider                       Derwent                                     PubChem
  • eMolecules                         DiscoveryGate                           Ryan Scientific
  • GDB – Berne                      GKV BIO
  • IBM Patents                        Index Chemicus
  • NCI                                      SPRESI
  • PubChem
  • QueryChem
  • Ryan Scientific
  • SureChem
  • ZINC
67
Data Elements/Content/Characteristics
  •   CAS RN                 SMILES                InChI                     Names
  • (all partial except CAS)
  •  Bio-Rad                                                                             Bio-Rad
  • ChemDB                   ChemDB                                           ChemDB
  • CAS                                                                                    CAS
  •                                                                                            Chemisches Zentralblatt
  • ChemNavigator        ChemNavigator                                 ChemNavigator
  • ChemSpider             ChemSpider      ChemSpider            ChemSpider
  • CrossFire Beilstein                            CrossFire Beilstein   CrossFire Beilstein
  •                                                                                            Derwent
  • DiscoveryGate         DiscoveryGate    DiscoveryGate        DiscoveryGate
  •                                  eMolecules                                        eMolecules
  •                                                                                            GDB
  •                                  GKV BIO
  •                                  IBM Patents        IBM Patents           IBM Patents
  •                                                                                            Index Chemicus
  • NCI                           NCI                      NCI                        NCI
  •                                  QueryChem                                       QueryChem
  • PubChem                 PubChem           PubChem                PubChem
  •                                                                                             Ryan Scientific
  •                                   Sure Chem        SureChem              SureChem
  •                                                                                            SPRESI
  •                                                                                            Thomson Pharma
  •  ZINC                       ZINC                   ZINC                       ZINC



68
Data Elements/Content/Characteristics
  • Patents                 Data                 Reactions              Predictions


  • CAS                                                                 CAS                               CAS (some)
  • Chemisches Zentralblatt
  •                                                                                                                ChemSpider
  • CrossFire Beilstein   CrossFire Beilstein      CrossFire Beilstein           CrossFire Beilstein
  • Derwent
  • DiscoveryGate           DiscoveryGate           DiscoverGate                    DiscoveryGate
  • GKV BIO
  • IBM Patents
  • Index Chemicus                                           Index Chemicus
  •                                    NCI                                                                      NCI (some)
  • SureChem
  •                                                                      SPRESI
  •                                                                                                                 ZINC


69
               InChI  Goal

The objective and goal of the IUPAC Chemical Identifier Project  is to create a unique label, the IUPAC Chemical Identifier  (InChI), which will be an Open Source, non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data and information compilations.
70
InChI Characteristics
  • 1. Easy to generate (It will use existing software.)
  • 2. Expressive (It will contain structural information.)
  • 3. Unique/Unambiguous
  • 4. Easy to search for structure via Internet search engines (Google, Yahoo, Microsoft Live, etc.) using the InChI (hash) Key.
71
 
72
 
73
 
74
Really, Really Long InChI (Palytoxin)
75
D-Fructose (natural)

InChI=1/C6H12O6/c7-1-3(9)5(11)6(12)4(10)2-8/h3,5-9,11-12H,1- 2H2/t3-,5-,6-/m1/s1

InChIKey=BJHIKXHVCXFQLS-UYFOZJQFBH


L-Fructose

InChI=1/C6H12O6/c7-1-3(9)5(11)6(12)4(10)2-8/h3,5-9,11-12H,1-2H2/t3-,5-,6-/m0/s1

InChIKey=BJHIKXHVCXFQLS-FUTKDDECBR
76

Fructose
D-Fructose
InChIKey: BJHIKXHVCXFQLS-UYFOZJQFBH

Fructose
L-Fructose
InChIKey: BJHIKXHVCXFQLS-FUTKDDECBR


Note: First 14 characters of BOTH InChIKeys are the SAME!

The 1st block (14) encodes the connectivity.
The 2nd block (8) encodes proton positions (tautomers), stereochemistry, isotopes, etc.
Check Character
Flag Character – InChI version, presence/absence of stereo info/isotopes, etc.
77
Stereoisomers of menthol
78
InChIKey – collision resistance
  • As any hash, may be not unique for HUGE datasets
  • Estimated resistance (corresponds to ½ probability of a SINGLE collision):
    • 1st block:  6.1×109 molecular skeletons
    • 2nd block:  3.7×105  stereo/tauto/isotopomers per each skeleton


  • Number of molecules in current databases: ~(3-4) ×107


  • Testing:
    • internal:  up to 7.7×107 molecules
    • independent: by ChemSpider (http://www.chemspider.com)
      1.7×107 real molecules
    • No collisions found.


79
Critical InChI Adopters
  • Publishers:


  • Royal Society of Chemistry www.rsc.org/Publishing/Journals/ProjectProspect/
  • Prous Science - Drugs of the Future
  • BioMed Central - Chemistry Central www.chemistrycentral.com


  • Other:


  • 1.         European Patent Office (EPO)
80
            InChI URL’s
Main IUPAC InChI page:http://iupac.org/inchi/

InChI Google video lecture (11/06):
http://video.google.com/videoplay?docid=-6653695245776470969&q=heller+chemical

InChI Google video lecture (10/07): http://youtube.com/watch?v=F9XppyZg4E4

B. Kosata (Prague):
www.inchi.info

P. Murray-Rust/Nick Day (Cambridge): http://wwmm.ch.cam.ac.uk/inchifaq/

ChemSpider:
http://www.chemspider.com/inchi.asmx
81
Summary - Overall Features of InChI (1)
  •           1. InChI is the only publicly available method for creating a unique chemical identifier for a given chemical structure.  In addition InChI has a number of other value attributes noted below.

    2. InChI is free-open source software.  (Web 2.0)

    3. Any organization (public and private) can use for internal and/or external structure files at no cost. (Web 2.0)
  •          The Web 2.0 is the second generation of web-based communities and hosted services — such as social-networking sites — which facilitate collaboration and sharing between users.  Web 1.0 is where information comes from one central source.
82
Summary - Overall Features of InChI (2)
  •     4. It is sponsored by IUPAC and primarily implemented by the US scientific standards agency – NIST.

    5. It allows the chemistry community to use the InChI key  as a universal chemical identifier. This means  InChI’s can be freely searched for via Google/Yahoo/Microsoft Live and other Internet search engines.  (Web 2.0)

    6. The InChI Key unlocks the data and information from all sites around the world that choose to use it.  The InChI Key allows all those commercial chemical information providers (e.g., Elsevier, Thomson,  Prous Science, and John  Wiley )  to have a free structure and number/linking system. (Web 2.0)


83
 Acknowledgments
  • Philip Abrahams, Steve Bachrach, Steve Boyer, Colin Batchelor, Ted Becker, Jost Bohlen, Pieter Bolman, Evan Bolton, Bob Bovenschulte, Steve Bryant, Harry Collier, Alice Cooper,  Nick Day, Rene Deplanque, Ron Dunn, Simon Quellen Field, Guenter Grethe, Stevan Harnad, Wolf-Dietrich Ihlenfeldt, Sami Kassab, Richard Kidd, Sandy Lawson, David Lipman, Gary Mallard, Randy Marcinko, Bill Milne, Carmen Nitsche, Josep Prous, Chris Reed, Rich Roberts, Peter Murray-Rust, Henry Rzepa,  Peter Shepherd, Bill Town, Andrea Twiss-Brooks, Don Walters, Wendy Warr, Tony Williams, and Ann Wolpert