Notes
Slide Show
Outline
1
The IUPAC InChI project –
A Status Report
  • Stephen Heller
  • InChI-Trust Project Director
  • steve@inchi-trust.org
2
"The slides from this presentation..."
  • The slides from this presentation can be found at:


  • http://www.hellers.com/steve/pub-talks/(San Francisco 2010 link)
  •                           and at the new InChI Trust web site


  • The main web sites for the IUPAC InChI project are:


  • http://www.iupac.org/inchi


  • and


  • http://www.inchi-trust.org
3
Disclaimer
 These slides were made from 100% recycled electrons.

This will be a well balanced presentation.
I have a chip on both shoulders.

I will try to be politically correct, and have even take a course in being PC.    But, I flunked it  –  twice.
4
Overview
5
Objective
  • The objective of the IUPAC Chemical Identifier Project is to create a unique label, the IUPAC Chemical Identifier  (InChI), which will be an Open Source, freely available, non-proprietary Identifier for well defined  chemical substances that can be used in printed and electronic data sources thus enabling easier LINKING of and working with diverse data and information compilations.
6
Why Use InChI?
  • For publishers and database providers using InChI gives one a competitive advantage being able to LINK content from multiple sources.  It offers users the ability to help in new discoveries from existing information and data by easily being able to integrate, remix, and retell. Business models that depend on things not changing (closed access and making aggregation and integration difficult) are so 20th century.  InChI is a small, but vital, part of new business models and technologies involving chemicals that will lead to new discoveries. Combinability increases the value of information and data.
7
""In my view"
  • "In my view, the most important rule of business in today's integrated and digitized global market, where knowledge and innovation tools are so widely distributed. It's this: Whatever can be done, will be done. The only question is will it be done by you or to you.  Just don't think it won’t be done."

    By Thomas L. Friedman
    Published: December 9, 2008
    NY Times

    (That is  – not if, but when and by whom. The web has made the impossible to now be possible. )
8
I skate to where chemistry is going to be, not where it has been.

With apologies to ice hockey great, Wayne Gretzky.
9
The Internet has made the world more homogenous for chemical information and the Open Source InChI/InChIKey is not affected by global boundaries or proprietary chemical structure representations
10
InChI is an agent of change for those individuals and organizations who have defined chemical structures and want to make their information known and available to the community.
11
           InChI Technology                               Other Technology
12
Critical factors for the success of  InChI project

  • Technically competent staff
  • Fulfill a real community need
  • Political and Financial Support
13
Technical: InChI is a unique representation/identifier for defined chemical structures. Probably marginally better than previous ones. The InChI algorithm was built on the shoulders of giants.
 http://en.wikipedia.org/wiki/Graph_theory

Practical: InChI and the related hash-code compressed InChIKey are the only available universal LINKs for in-house and public databases of defined chemical structures.
14
InChI is the worst computer readable structure representation except for all those other forms that have been tried from time to time.

 With apologies to Sir Winston Churchill
(House of Commons speech on Nov. 11, 1947 )
15
Initial  InChI  Goal (Plan A)
– Cover 100% of all chemical found in the literature and in databases and other sources.


Current  InChI Goal  (Plan B)
- Cover 99.9% of chemicals found in the literature and in databases and other sources.
16
Bar Codes – not designed to be read by humans

InChI – not designed to be read by humans
17
Why InChI is becoming a success

1. Organizations need a structure representation for their content (databases, journals, chemicals for sale, products,  and so on) so that their content can be LINKED  to and combined with other content on the Internet.

2.  InChI is a public domain algorithm that anyone, anywhere can freely use.
18
How do we know the InChI project is beneficial?

Success is uncoerced adoption
19
                 InChI have some advantages over other chemical identifiers developed before:

(1) They are freely useable and non-proprietary.

(2) They allow a more advanced representation of chemical
information than other codes (such as the SMILES code).

(3) They are unambiguous, i.e. conversion of chemical
structures using standardized algorithms only leads to one InChI.

(4) They are precisely indexed by major search engines such as Google.

However, InChI are not applicable to generic formats often disclosed in patent literature, such as Markush structures, since they were rather designed to represent specific chemical
structures and compounds. InChI therefore are not yet useful for comprehensive retrieval of patent literature.



Excerpt taken from:
Full-text prior art and chemical structure searching in e-journals and on the internet – A patent information professional’s perspective
World Patent Information, Volume 31, Issue 4, December 2009, Pages 278-284
Maik Annies
20
The best way to represent a chemical compound is not by a name or even a database identifier, but by its structure encoded in Structure Data Format (SDF MDL V2000) or the open Chemical Markup  Language (CML) format or InChI codes. A few databases already  provide the IUPAC/NIST standard of InChI codes or the shorter  hashed InChIKey. The new InChIKey resolver services implemented by the Royal Society of Chemistry (RSC) and Chemspider allows to create InChIKeys from molecular structures and a reverse lookup  of InChIKeys to obtain the associated known structures from  molecular databases. The InChIKey can be used for web based literature search and also for chemical database search and  merging of compound lists from multiple sources. Some other databases support the SMILES code for structures. The use of  SMILES code is not recommended because multiple vendors create different representations of the SMILES code. Also true canonical  (unique) SMILES are vendor specific.


Extracted from:
Kind T, Scholz M, Fiehn O:
How large is the metabolome? A critical analysis of data exchange
 practices in chemistry.
PLoS One 2009, 4:e5440.
21
The idea is to create a mechanism that would allow CrossRef publishers to record InChIs in their submitted CrossRef metadata. This, in turn, would allow us to provide a service that would allow users to:

Lookup the published articles that mention a particular InChI.
Lookup the InChIs mentioned in a published article.
…

The following is a demonstrator of what an DOI2InChI lookup service might look like. Please note that the XML representation of the results is very basic and is not best-practice for linked-data.

The demonstrator currently only holds DOIs and InChIs for a few publishers.

A summary of the contents of the database can be found on the status page
http://inchi.crossref.org/status

A list of all the CrossRef DOIs that contain InChIs can be seen here:
http://inchi.crossref.org/dois

A list of all the InChIs that have been registered with CrossRef can be seen here:
http://inchi.crossref.org/inchis

The system provides the following API calls:

Return all the DOIs that have been registered with a given InChI
http://inchi.crossref.org/dois/InChI=1S/C4H6O2/c1-3-6-4(2)5/h3H,1H2,2H3

Return all the InChIs that have been registered for a given DOI
http://inchi.crossref.org/inchis/10.1038/nchem.215
© 2009 CrossRef  (CrossRef Labs web site for CrossRef experimenting  R&D)
22
 
23
The LINKED and Interoperable  and Combinable World of InChI
24
 InChI Trust Organization
25
 
26
                                                   InChI Trust Membership

        With the needs of NIST fulfilled with respect to what capabilities of an
InChI are required for NIST databases, and since IUPAC is fundamentally and culturally a volunteer organization, there needs to be a way to continue
development of InChI, and maintain the InChI algorithm.  As a result of
numerous meetings, emails, and discussions, it was concluded that a
not-for-profit organization would best fit the project needs. Thus the
decision to create and incorporate the "InChI Trust" in the UK.  As there is
no "free lunch", the Trust will need resources to continue to operate.
Membership in the InChI Trust requires annual dues. The income
from these revenues will be used exclusively for InChI development,
maintenance, and educational activities associated with the project.
Membership will entitle a member to influence the direction, priority, and
speed of further Trust activities.  Membership will also provide InChI Trust
"certification" of the InChIs and InChIKeys in a member's database.  Those
organizations which do not join the InChI Trust will still have free access
to the InChI algorithms but will not participate in any decision-making or direction -setting activities.
27
Summary

If you are not part of the solution; you are part of the precipitate.
28



                               Current  InChI Trust  Members

ACD Labs
ChemAxon
Dialog
Elsevier
FIZ CHEMIE – Berlin
Informa/Taylor & Francis
IUPAC
John Wiley & Sons
Microsoft
Nature  Publishing Group
OpenEye
Royal Society of Chemistry (RSC)
Symyx
Thomson-Reuters 
                                                                                                       3/7/2010
29
                                           Acknowledgements

(Primarily members for the IUPAC InChI subcommittee and associated InChI working groups)


Steve Bachrach, Colin Batchelor, John Barnard ,Evan Bolton,  Steve Boyer, Steve Bryant,  Szabolcs Csepregi ,Rene Deplanque, Nicko Goncharoff, Jonathan Goodman,  Guenter Grethe, Richard Hartshorn,  Jaroslav Kahovec , Richard Kidd, Hans Kraut, Alexander Lawson , Peter Linstrom,  Bill Milne, Gerry Moss, Peter Murray-Rust, Heike Nau , Marc Nicklaus, Carmen Nitsche, Matthias Nolte , Igor Pletnev, Josep Prous, Peter Murray-Rust,  Hinnerk Rey,  Ulrich Roessler, Roger Schenck , Martin Schmidt, Steve Stein, Peter Shepherd, Markus Sitzmann ,Chris Steinbeck, Keith Taylor, Dmitrii Tchekhovskoi,  Bill Town, Wendy Warr, Jason Wilde, Tony Williams, Andrey Yerin.

Special Acknowledgement: Ted Becker& Alan McNaught for their vision and leadership of the future of IUPAC nomenclature.