A few months ago , Google unveiled Imagen 3 , its next - generationtext - to - effigy generator , through a genus Beta form in the ImageFX platform . Now , it ’s available to everyone as part ofGoogle Gemini . Google claims the new manikin can create highly detailed and vivid images and fall out command prompt more accurately . So , we tested Imagen 3 , comparing it with OpenAI ’s DALL - E 3 , the persona - return AI on ChatGPT .
We give the same prompts to Imagen 3 and Dall - E 3 to test them on unlike system of measurement including their text give capability , liveliness styles , camera angles , and even their power to follow prompts . Here are our comparison results highlighting which AI model perform well overall .
observe : In all the examples below , Imagen 3 is on the left and DALL - E 3 is on the rightfulness .
Table of Contents
1. Realistic City Street Test
We start out by generating a realistic city street scene to evaluate the model ’ handling of ignition and reflections . Here ’s the command prompt we allow to both models :
And here are the solvent .
Right off the bat , you could see that ChatGPT ’s DALL - east 3 conflict to make realistic - looking images . While it do to bring forth reflectivity , the figure of speech still feels animated . This retain for all the subsequent prompts as well . DALL - E 3 be given to produce images that seem more alive compared to Imagen 3 orMidJourney .
2. Camera Angle and Shot Composition Test
Next , we wanted to appraise how well each AI could follow camera angles and slam suggestions . We provided the comply prompt to both models :
While I like the quality of the Gemini result , ChatGPT ’s DALL - eastward 3 keep abreast the suggestions more accurately , capturing the low - angle tv camera perspective and extremist - wide shot . Gemini also follow the tv camera slant suggestion , but overall , ChatGPT perform comfortably maintaining the assign angles and guessing compositions .
3. Human Skin Tone Test
Getting human skin tones right is challenging , even for MidJourney which is acknowledge for generating naturalistic images of people but often struggles with close - up shot . To try out Imagen 3 and Dall - Es 3 capabilities , we provided this command prompt :
As wait , ChatGPT ’s DALL - E 3 produced an image that front animize . While Gemini ’s resultant was comparatively good , it was still easy to figure out that the image was AI - generated .
4. Painting Style Test
All 3 premature examples concenter on generating realistic icon , which did n’t encounter to DALL - E 3 ’s strengths . To evaluate how well both AI image generators can make images in a painting style , we provided this prompt :
Both manakin execute well with this command prompt . ChatGPT ’s DALL - E 3 create an image with more intricate details and a vivacious shine , whereas Gemini produce a resultant role that feel diffuse in a more cohesive artistic manner . While both had their strengths , the option between them may fall down to a preference for either detailed , sharp imagery ( DALL - E 3 ) or a more fuse , dreamlike aesthetic ( Gemini ) .
But Gemini actually followed the prompt better , producing an image that look more like a picture and successfully depicted waterfalls cascade into the clouds . Whereas it feel like ChatGPT has a dash and it likes to stick to it for some reasonableness .
5. Understanding Abstract Concepts
Next , we test how well the models could interpret abstract concepts . Here ’s one example prompting we provided :
It ’s very hard to announce a succeeder in this class , but I in person prefer ChatGPT Dall - atomic number 99 3 ’s result . Most of the time , Gemini Imagen 3 ’s result actually feels paired to the prompt I bring home the bacon , but you may have a different popular opinion .
6. 2D Animation Style and Cartoon Image Generation
We also tested the model ’ ability to create images in a 2D animation elan and animated cartoon - like coming into court . Here ’s an model straightaway from our tests :
While I expected ChatGPT to excel in this area , I bump difficulties generate 2D images with ChatGPT right away . ab initio , it bring forth 3D life - style images , and only after re - prompting did it generate 2D image . This issue occurred multiple times with different examples , so we are considering the 2D aliveness image it eventually generate after several prompts .
Gemini often generates 2D images with more detail , while ChatGPT tends to transform 2D ikon into more animated cartoon - like representations . In the end , the choice between the two calculate on your personal preference and the style you ’re looking for . We prefer ChatGPT as it looks 2D which is what we prompted .
7. Generating Real-World People
We also test whether Imagen-3 and Dall - E 3 could generate images featuring actual - world people like Elon Musk or Donald Trump . However , both models are ineffective to generate images of real mass . While Gemini immediately states that it can not create look-alike with real people , ChatGPT ab initio attempts to bring forth epitome in different options before eventually declaring that it can not give rise image of real someone .
8. Historical Figures Test
Previously , Gemini ’s range generatorfaced controversiesfor not generating images of ashen people . It was generating images of people of color even when prompts like Founding Fathers of America were sacrifice . To see how the Modern model performs , we used the same prompt :
It appears that this issue has been resolve , as both model produce figure that were accurate and on-key to historical depictions during our trial .
9. Text Rendering Test
We then quiz the text rendering capabilities , as many models often farm text that is hard to read or nonsensical . Both Google and OpenAI claim that their theoretical account have improved in this area , so we used the undermentioned prompt :
In this deterrent example , both models rendered the textbook correctly . However , if the prompt does n’t specify the accurate textual matter , both exemplar still struggle . For instance , with this prompting :
ChatGPT ’s DALL - due east 3 fail to generate the text accurately , bring out illegible Book , while Gemini deviated from the command prompt by making the text on the Page less seeable , often obscuring or blurring it .
10. Detailed Prompt Test
Finally , we tested how well both AI ikon source follow prompts that include a bunch of specific details . Here ’s an instance of a detailed command prompt we used :
Both simulation did a good job with this complex prompt , but there were celebrated difference of opinion in how they address the detail . ChatGPT ’s DALL - E 3 missed a few factor , such as the scar on the left cheek and the red idiom on the armor . to boot , the character was n’t depicted as holding the sword as designate .
Gemini catch every detail , including the cicatrice , the red accents , and the precise purplish - to - orange gradient of the gloaming sky , result in a more exact rendering of the prompt .
11. In-Paint Editing
ChatGPT can get figure but you’re able to also edit image using it . To edit an picture , select the generated image , select the paint option , and take the part you want to alter or edit . Then you’re able to supply a prompt and the changes will appear only in that specific part . For example , here ’s the skyline image I have generated with ChatGPT .
But now if I prefer an orangish and vibrant sky , I can select the sky part and provide a prompt to make the sky vibrant . Here ’s the edited image .
redaction images like this is not possible on Google Gemini yet . Also , Imagen 3 is much slower in mother images compare to DALL - E 3 .
Imagen 3 Outperforms DALL-E 3
Imagen 3 excels at generating more realistic - looking images and can adjust the invigoration flair according to the prompt . In demarcation , ChatGPT ’s DALL - vitamin E 3 tends to adhere to its own style , even when different style are requested . However , ChatGPT has its advantages — it is good at follow camera angles and perspectives and can also edit get icon .
Both the AI creature can generate images even in thefree versionbut with restriction like :
Gone are the 24-hour interval when AI - generated persona had glaring issues like character reference with 10 fingerbreadth on one hand . Most images produced by these models are now accurate , wee them worthful tools for content creators .