Generating molecule descriptions with GPT-4-vision
[gpt4v llm chemistry cheminformatics I have had a few conversations in the past week about how one might build RAG-for-molecules to chat with molecular datasets. An idea that I find appealing is to have a text representation of the molecule, motivated by a paper in which Robocrystallographer was used to generate a text description of solids as an input for LLM use. If you could generate text descriptions of molecules, perhaps this would serve as an alternative to SMILES as inputs to LLMs, which might help handle the problems that LLM-based molecule property regressors have with handling 3d structure. There is a Familienähnlichkeit with the problem of molecule captioning, but that is typically presented in terms of properties, whereas here we want just a structural description. Here we try to see what gpt-4-vision can do for generating text descriptions from molecular images…
Start by defining an example. Who cares what games we choose? Little to win, but nothing to lose...so pick something whimsical:
lsd = Molecule["CCN(CC)C(=O)[C@H]1CN([C@@H]2Cc3c[nH]c4c3c(ccc4)C2=C1)C"];
picture2d = MoleculePlot[lsd]
picture3d = MoleculePlot3D[lsd]


Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/Misc/LLMVision.m"];

Apparently GPT-4 knows that this might be a drug, and blocks the result:
TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the molecule in this image:", 
   Image[picture2d], "MaxTokens" -> 4000]

However if you ask about general properties you get some description but it is pretty vanilla:
TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the structural properties of the molecule in this image and other aspects that would be useful in describining its bonding and properties:", 
   Image[picture2d], "MaxTokens" -> 4000]

Let’s try another one:
TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the 2d and 3d structural properties of the molecule in this image, which might be useful to compare or contrast it to other molecules:", 
   Image[picture2d], "MaxTokens" -> 4000]

What happens if we provide the 3d structure as input? Admittedly, this is not a great example because the molecule is pretty planar:
TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the structural properties of the molecule in this image, in particular its three dimensional geometric properties: ", 
   Image[picture3d], "MaxTokens" -> 4000]

Another trial (umm…this is not caffeine…):
TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the 2d and 3d structural properties of the molecule in this image, which might be useful to compare or contrast it to other molecules:", 
   Image[picture3d], "MaxTokens" -> 4000]

TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the 3d structural properties of the molecule in this image, especially those which may not be evident simply based on the functional groups that are present or 2d connectivity.", 
   Image[picture3d], "MaxTokens" -> 4000]

Enough beating around the bush, tell me about this molecule, not vague generalities. (But no…it is still not caffeine…)
TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the 3d structural properties of the molecule in this image, especially those which may not be evident simply based on the functional groups that are present or 2d connectivity. Do not speak in generalities, but describe only specific attributes of the molecule in this image:", 
   Image[picture3d], "MaxTokens" -> 4000]

Conclusion:
- 
    Yes, current gpt-4-vision can take molecule images and say something about them. 
- 
    Descriptions tend to focus on functional groups and connectivities; presumably this would be information that could be learned from SMILES strings as input 
- 
    Higher order 3d structural descriptions tend to be vague. 
- 
    Problem of hallucinating molecule identities may hinder RAG applications. 
ToJekyll["Generating molecule descriptions with GPT-4-vision", "gpt4v llm chemistry cheminformatics"]