I have had a few conversations in the past week about how one might build RAG-for-molecules to chat with molecular datasets. An idea that I find appealing is to have a text representation of the molecule, motivated by a paper in which Robocrystallographer was used to generate a text description of solids as an input for LLM use. If you could generate text descriptions of molecules, perhaps this would serve as an alternative to SMILES as inputs to LLMs, which might help handle the problems that LLM-based molecule property regressors have with handling 3d structure. There is a Familienähnlichkeit with the problem of molecule captioning, but that is typically presented in terms of properties, whereas here we want just a structural description. Here we try to see what gpt-4-vision can do for generating text descriptions from molecular images…

Start by defining an example. Who cares what games we choose? Little to win, but nothing to lose...so pick something whimsical:

lsd = Molecule["CCN(CC)C(=O)[C@H]1CN([C@@H]2Cc3c[nH]c4c3c(ccc4)C2=C1)C"];
picture2d = MoleculePlot[lsd]
picture3d = MoleculePlot3D[lsd]

0k7z7mf4ng5q7

0g35gsr0oakwr

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/Misc/LLMVision.m"];

16jirb0w6epoa

Apparently GPT-4 knows that this might be a drug, and blocks the result:

TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the molecule in this image:", 
   Image[picture2d], "MaxTokens" -> 4000]

108jdaw4kdff9

However if you ask about general properties you get some description but it is pretty vanilla:

TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the structural properties of the molecule in this image and other aspects that would be useful in describining its bonding and properties:", 
   Image[picture2d], "MaxTokens" -> 4000]

0n2s5d9fzz7c9

Let’s try another one:

TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the 2d and 3d structural properties of the molecule in this image, which might be useful to compare or contrast it to other molecules:", 
   Image[picture2d], "MaxTokens" -> 4000]

0d0afgxnfgd89

What happens if we provide the 3d structure as input? Admittedly, this is not a great example because the molecule is pretty planar:

TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the structural properties of the molecule in this image, in particular its three dimensional geometric properties: ", 
   Image[picture3d], "MaxTokens" -> 4000]

1ivz5llks99b8

Another trial (umm…this is not caffeine…):

TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the 2d and 3d structural properties of the molecule in this image, which might be useful to compare or contrast it to other molecules:", 
   Image[picture3d], "MaxTokens" -> 4000]

0imwh76fbg50u

TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the 3d structural properties of the molecule in this image, especially those which may not be evident simply based on the functional groups that are present or 2d connectivity.", 
   Image[picture3d], "MaxTokens" -> 4000]

1r981d0bilcsp

Enough beating around the bush, tell me about this molecule, not vague generalities. (But no…it is still not caffeine…)

TextCell@
  LLMVisionSynthesize["You are an expert organic chemist. Describe the 3d structural properties of the molecule in this image, especially those which may not be evident simply based on the functional groups that are present or 2d connectivity. Do not speak in generalities, but describe only specific attributes of the molecule in this image:", 
   Image[picture3d], "MaxTokens" -> 4000]

045yhk1xswmo9

Conclusion:

  • Yes, current gpt-4-vision can take molecule images and say something about them.

  • Descriptions tend to focus on functional groups and connectivities; presumably this would be information that could be learned from SMILES strings as input

  • Higher order 3d structural descriptions tend to be vague.

  • Problem of hallucinating molecule identities may hinder RAG applications.

ToJekyll["Generating molecule descriptions with GPT-4-vision", "gpt4v llm chemistry cheminformatics"]