Cheminformatics: SMILES and InChI quirks for salts
[science
cheminformatics
mathematica
chemdraw
chemistry
]
Olivia Vanden Assem ‘25 asks: Why am I getting inconsistent SMILES,InChI, and InChI Key results for the salt and neutral acid-base representations of ammonium nitrate? There are some quirks about interconversion between SMILES and InChI in standard implementations that can result in neutral and salt forms of a pair of molecules being different… **
Suppose we have ammonium nitrate. We can represent this as two neutral molecules using the following InChI identifier:
neutralInChI = "InChI=1S/NO3.H3N/c2-1(3)4;/h;1H3/q-1;/p+1";
MoleculePlot[%]
If we convert this into SMILES, we also get two neutral molecules:
neutralSMILES = Molecule[neutralInChI]["SMILES"]
MoleculePlot[%]
(*"[N+]([O-])(=O)O.N"*)
But what if we start with the salt form of the compound? In fact, this might be what we expect this to be a more faithful representation of the compound, as we have reacted the acid (nitric acid) with the base (ammonium) to form the ammonium nitrate salt:
chargedSMILES = "[N+](=O)([O-])[O-].[NH4+]";
MoleculePlot[%]
However, the problem is when we try to convert the charged salt specification to InChI, it gets converted to the neutral form!
Molecule[chargedSMILES]["InChI"]
%["ExternalID"] == neutralInChI
MoleculePlot[%%]
(*True*)
However, the resulting InChI keys for the charged specification (by SMILES) and the neutral specification (either by InChI or SMILES) are different !
Molecule[chargedSMILES]["InChIKey"]
{Molecule[neutralInChI]["InChIKey"], Molecule[neutralSMILES]["InChIKey"]}
Note that this is not a Mathematica-specific bug; this type of conversion problem also occurs in ChemDraw 22.2, which leads me to think that this is cooked into the underlying InChI specification (or at least the standard library that everyone uses)
Recommendation: This is a problem, as it means that trying to do a database search on InChIKey may not find a match if one party uses the charged SMILES to generate an InChI key and the other party converts it to InChI first before generating the InChI key. My recommendation is to represent salts as the neutral form (when specifying them by SMILES or InChI) to avoid this problem. If that is not feasible, convert input strings to InChI first (this will convert the salt back to the neutral acid and base) and then generate a new molecular representation with the resulting InChI to use for generating InChI keys.
ToJekyll["Cheminformatics: SMILES and InChI quirks for salts", "science cheminformatics mathematica chemdraw"]