A few days ago, Gemini rolled out its image generation feature in the 2.0 Flash version, and the internet erupted with stunning examples. Now, OpenAI is stepping up to the plate, raising the bar even higher by introducing native image generation (powered by GPT-4o) in ChatGPT.
Sam Altman introduced the new feature with enthusiasm, describing it as “one of the most fun, cool things we have ever launched.” He emphasized that while image generation has been around for some time (including OpenAI’s original DALL-E), this new implementation represents a substantial leap forward in utility and quality.
The native image generation feature is now available to all the ChatGPT users (free and paid). API access will be coming soon.
Time needed: 2 minutes
It is quiet simple to use the ChatGPT image generation feature. All you have to do is follow these simple steps:
Log in to the service where the AI tool is hosted (e.g., for ChatGPT, you’d go to chat.openai.com or the relevant app). You need a free or paid account to access the image generation feature. Free users can only get 3 images generated in a day.
Open a new chat or session. Most AI platforms with image generation let you type a prompt directly into the chat interface. Make sure you are using the GPT 4o model as only this model supports image generation.
Tell the AI what image you want. Be specific – include details like the subject, style (e.g., “realistic,” “cartoon,” “Studio Ghibli”), colors, setting, and any other preferences.
For example: “Generate an image of a futuristic city at sunset with flying cars and neon lights, in a cyberpunk style.“
The model will take a couple of minutes to process your prompt and give you the desired image. You can upload your own image and ask it to modify it as well.
Once the image is generated, you’ll see it in the chat. If it’s not what you wanted, you can tweak your prompt (e.g., “Make the sky purple” or “Add a dragon in the foreground”) and ask for adjustments.
If you like the result, there’s usually an option to download the image for personal use.
Now that you know how to access this feature, let’s look at some examples in the next section.
Prompt: “Generate a 3-part story of a group of kids unboxing a treasure, inside which is a new red coloured chocloate bar, which they eat and go to the chocolate world. Images should be 3D and in comic style. Add speech bubbles:
1 – What’s this?
2 – WOW, a Chocloate Bar
3 (Suprised reaction in image) – Are we in the chocolate world.“
Output:
Observation:
The response nailed the prompt – vibrant 3D comic-style frames with spot-on speech bubbles. However, when I asked ChatGPT to adjust Frame 1 to show the full image (it was cropped), it struggled to follow my instructions accurately.
Prompt: “Convert the given image into a meme – “Let the world burn”
Output:
Observation:
The meme came out decently, but the facial features of the original image were altered in the process. It’s not as precise as I’d hoped.
Prompt: “The image is of working of a voice agent. It has 3 main part
Speech-to-text (STT): Captures and converts your spoken words into text.
Agentic logic: This is your code (or your agent), which figures out the appropriate response.
Text-to-speech (TTS): Converts the agent’s text reply back into audio that is spoken aloud.
Convert this basic image into vibrant image.“
Output:
Observation:
The model grasped the concept and delivered a lively, upgraded version of the original. Solid execution overall.
Prompt: “Add a money plant to the table”
Output:
Observation:
GPT-4o nailed it, generating a seamless image of a money plant on the table, no awkward patching. Flawless execution!
Prompt: “Create a comic front page showing robots and Scientist“
Output:
Observation:
This one’s a winner – bold, detailed, and perfectly aligned with the prompt. A standout result.
Prompt:“Create a 4-image story based on the following sequence:
GPT-4o believes it’s the coolest model out there.
GPT-4.5 arrives and surpasses GPT-4o in performance.
GPT-4o puts in hard work to improve itself.
GPT-4o becomes smarter by mastering image generation.”
Output:
Observation:
This was the most challenging task to complete. Most of the time, the names of the robots were getting confused, but after 10 iterations, I managed to find a satisfactory solution.
I loved exploring the 4o image generation feature. Did you try it? Share your examples in the comment section below!
OpenAI emphasized that this feature offers a higher degree of creative freedom than previous releases, aiming to balance creative expression with appropriate safeguards. While image generation is currently slower than previous iterations, the team believes the dramatic quality improvement more than justifies the wait and expects to improve speed over time.
This integration marks a significant step toward truly multimodal AI that can seamlessly work across different types of content, opening new possibilities for creative expression, education, business applications, and more.
Stay tuned to Analytics Vidhya Blog for more such content!