Dall-E-3 image generation in Mathematica
[mathematica
chiguiro
art
dall-e-3
]
By default (as of 24 Nov 2023, Mathematica 13.3), it appears that ImageSynthesize uses Dall-E-2 for image generation, as the results are kind of trash—but with some tricks you can get it to use Dall-E-3 instead…:
Here’s an example using the default settings:
ImageSynthesize["A cartoon of a a capybara riding a motorcycle and wearing a bowtie"]
ImageSynthesize does not appear to accept LLMEvaluator option (like LLMSynthesize, etc. does):
ImageSynthesize["A cartoon of a a capybara riding a motorcycle and wearing a bowtie",
LLMEvaluator -> LLMConfiguration[<|"Model" -> "dall-e-3"|>]]
It looks like by default the OpenAI Service also uses Dall-E-2 (although I am not sure why it throws this error):
ServiceExecute["OpenAI", "ImageCreate", {"Prompt" -> "A cartoon of a a capybara riding a motorcycle and wearing a bowtie"}]
However, there is an (undocumented) model specification ability to specify models. Notice how Dall-E-3 better captures the characteristic nose shape:
ServiceExecute["OpenAI", "ImageCreate", {"Prompt" -> "A cartoon of a a capybara riding a motorcycle and wearing a bowtie", "Model" -> "dall-e-3"}]
Confirm that the default versions use Dall-E-2:
ServiceExecute["OpenAI", "ImageCreate", {"Prompt" -> "A cartoon of a a capybara riding a motorcycle and wearing a bowtie", "Model" -> "dall-e-2"}]
ToJekyll["Dall-E-3 image generation in Mathematica", "mathematica openai chiguiro art"]
Parerga and Paralipomena
Came across some recent Wolfram Community posts that provide relevant functionality:
- LLMVisionSynthesize…and
LLMVisionFunction
for using GPT-4v. Also shows a nice trick of using theLLMPrompt["NothingElse"]["JSON"]
andLLMPrompt["CodeWriter"]
.- Available at
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/Misc/LLMVision.m"]
- Available at
- callDALLE .. and other functions that support thee Text-to-Speech API and GPT-vision (
callTTS
,callGPTVision
) - Prompting strategies for Dall-E-3, and advanced use of random seed settings, forbidden topics, etc.