|
1
|
|
|
2
|
|
|
3
|
|
|
4
|
- Stephen R. Heller
Alan McNaught
- Igor Pletnev
- Stephen E. Stein
Dmitrii Tchekhovskoi
|
|
5
|
- 1. Background/History/Objective
- 2. Development of InChI
- 3. InChIKey
- 4. InChI Adoption & Use
- 5. Conclusion
|
|
6
|
|
|
7
|
|
|
8
|
|
|
9
|
- Using an InChI/InChIKey
knowing you find a match if it is there and not need to worry if it was
coded differently by another person or program.
- The InChI/InChIKey is a system
for both public and private (fee-based) sources.
|
|
10
|
- Using InChI means you can freely exchange
and/or link structure files with
others within your organization and with any person or organization
anywhere in the world knowing the structure name, the InChI/InChIKey,
will be the same. You can search for the InChI/InChIKey on the Internet,
using Microsoft Live Search, and other search engines (.e.g., Yahoo and
Google).
|
|
11
|
|
|
12
|
|
|
13
|
- 1. Easy to generate (It will use existing software.)
- 2. Expressive (It will contain structural information.)
- 3. Unique/Unambiguous
- 4. Easy to search for structure via Internet search engines (Google,
Yahoo, Microsoft Live, etc.) using the InChI (hash) Key.
|
|
14
|
|
|
15
|
- Cooperation with and support of the chemical, biochemical, pharma, life
sciences, healthcare, and scientific publishers in the development and
use of the InChI/InChIKey
|
|
16
|
|
|
17
|
|
|
18
|
|
|
19
|
|
|
20
|
|
|
21
|
|
|
22
|
|
|
23
|
- The InChI string has
been found to be too long for Internet search engines to use, hence the
need for a fixed length InChIKey. The InChIKey is a 25 character (14+8 =
22 +1 check + 1 flag + 1 dash)
hash code of the InChI string. It is made up to four (4) parts:
-
AAAAAAAAAAAAAA-BBBBBBBBCD
- 14 characters for the basic
structure
- 8 characters for the layers
- 1 character is a “check”
character
- 1 character is a flag
indicating certain features
- (e.g., fixed
or not fixed hydrogen atoms)
- A hash code is a fixed length condensed digital representation of a
variable character string.
- The InChIKey is based on truncated SHA-256 cryptographic hash function.
-
(http://en.wikipedia.org/wiki/SHA-2)
|
|
24
|
- The principal new features of the InChIKey are:
- A fixed-length (25-character) condensed digital representation of the
- Identifier to be known as InChIKey. In particular, this will
- * facilitate web searching, previously complicated by unpredictable
breaking of InChI character strings by search engines
- * allow development of a web-based InChI lookup service
- * permit an InChI representation to be stored in fixed length fields
- * make chemical structure database indexing easier
- * allow verification of InChI strings after network transmission.
|
|
25
|
- Caffeine:
- InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
- InChIKey=RYYVLZVUVIJVGH-UHFFFAOYAW
- First block (14 letters), encodes molecular skeleton (connectivity):
- RYYVLZVUVIJVGH
- Second block (8 letters), encodes proton positions (tautomers),
- stereochemistry, isotopes, reconnected layer: UHFFFAOY
- Flag character, indicates InChI version, presence/absence of fixed H
- layer, isotopes, and stereochemistry: A
- Check character: W
|
|
26
|
|
|
27
|
D-Fructose
(natural)
InChI=1/C6H12O6/c7-1-3(9)5(11)6(12)4(10)2-8/h3,5-9,11-12H,1-
2H2/t3-,5-,6-/m1/s1
InChIKey=BJHIKXHVCXFQLS-UYFOZJQFBH
L-Fructose
InChI=1/C6H12O6/c7-1-3(9)5(11)6(12)4(10)2-8/h3,5-9,11-12H,1-2H2/t3-,5-,6-/m0/s1
InChIKey=BJHIKXHVCXFQLS-FUTKDDECBR
|
|
28
|
|
|
29
|
|
|
30
|
|
|
31
|
|
|
32
|
- 1. InChI is the only
publicly available method for creating a unique chemical identifier for
a given chemical structure. In
addition InChI has a number of other value attributes noted below.
2. InChI is free-open source software.
3. Any organization (public and private) can use for internal
and/or external structure files at no cost.
|
|
33
|
- 4. It is sponsored by IUPAC
and primarily implemented by the US scientific standards agency –
NIST.
5. It allows the scientific and medical - healthcare community to
use the InChIKey as a universal
chemical identifier. This means
InChI’s can be freely searched for via Internet search
engines.
6. The InChIKey unlocks the data and information from all sites
around the world that choose to use it.
The InChIKey allows all those commercial chemical information providers
(e.g., Thieme, Elsevier,
Thomson, Prous Science, and
John Wiley ) to have a free structure and
number/linking system.
|
|
34
|
- Philip Abrahams, Steve Bachrach, Colin Batchelor, Ted Becker, Jost
Bohlen, Pieter Bolman, Evan Bolton, Steve Bryant, Harry Collier, Alice
Cooper, Nick Day, Rene Deplanque, Ron Dunn, Simon Quellen Field,
Guenter Grethe, Wolf-Dietrich Ihlenfeldt, Sami Kassab, Richard Kidd,
Sandy Lawson, David Lipman, Gary Mallard, Randy Marcinko, Bill Milne,
Carmen Nitsche, Rudy Potenzone, Josep Prous, Chris Reed, Rich Roberts,
Peter Murray-Rust, Henry Rzepa,
Peter Shepherd, Bill Town, Wendy Warr, Tony Williams, and Ann
Wolpert
|