|
1
|
- Stephen Heller, Stephen Stein, & Dmitrii Tchekhovskoi
- Physical & Chemical Properties Division
- NIST
- Gaithersburg, MD
- srheller@nist.gov
|
|
2
|
|
|
3
|
|
|
4
|
|
|
5
|
|
|
6
|
|
|
7
|
|
|
8
|
- Chemical structure is the true ‘identifier’
- But, structure representations are not unique or convenient for
computers.
- So, convert structure to a unique ‘name’ by fixed algorithms
- The IUPAC International Chemical Identifier (InChI)
|
|
9
|
- Chemicals
- Fast isomerization (tautomerization)
- Ill-defined connectivity
- Chemists
- Differing conventions
- Depends on discipline, education and convenience
- Imprecision/uncertainty
|
|
10
|
- Chemistry
- ‘Normalize’ Input Structure
- Math
- ‘Canonicalize’ (label the atoms)
- Equivalent atoms get the same label
- Format
- ‘Serialize’ Labeled Structure
- Output as character string (‘name’)
|
|
11
|
- Divide structure into ‘layers’
- Each layer ‘refines’ structure
- Ignore ‘Electron Density’
- Use simple ‘connectivity’ only
- Ignore bond type and electron location
- Stereochemistry
- sp2 and sp3 only
- Free rotation around single bonds
- No Z/E stereo for small rings (default)
|
|
12
|
|
|
13
|
|
|
14
|
|
|
15
|
|
|
16
|
|
|
17
|
|
|
18
|
|
|
19
|
- How can you represent chemistry without electrons?
- Chemistry is not represented, just identity
- Whole molecule properties are easily added.
- Do big molecules have big InChIs?
- Yes, just like systematic names
- How to handle other tautomers, substructures,..?
- Is InChI reversible?
- Partly - contains only data needed for ‘naming’
- Auxiliary fields can carry other information
- Is InChI extensible?
- New layers can add refinement
|
|
20
|
- Identify compounds at the known level of detail
- Convention-free (mostly)
- Generate quickly from structure
- Contains all essential connectivity information
- Simple ASCII representation
|
|
21
|
- NIH/NCBI/PubChem project – 700,000+ structures
- NCI Database – 250,000+ structures
- EPA –DSSTox database
- KEGG database – 9584 structures
|
|
22
|
|
|
23
|
- Available from Nick Day, Cambridge University, UK:
- ned24@cam.ac.uk
|
|
24
|
|
|
25
|
- Future versions of InChI, for example, could include phase information
and crystal structure, conformations, electronic states and additional
classes of stereochemistry.
- First additional project: Investigate adding polymers to InChI
|
|
26
|
- Steve Bachrach, Steve Bryant,
Nick Day, Rene Deplanque, Gary Mallard, Alan McNaught, Bill
Milne, Peter Murray-Rust, Henry Rzepa,
- Peter Shepherd, and Bill Town
|