ChatGPT 5.2 Image Recognition Capability Enhancement

The image generation capabilities of AI models have been rapidly improving lately. I used to think that maintaining consistency within a single image was only feasible for about two characters at most, but with ChatGPT 5.2, I believe this recognition has significantly improved.

Specifically, SCW features the following eight characters and a mascot named Squera.

Gathering with Gemini prior to 2.5

I want to generate this group together, but using Gemini prior to 2.5 only produced images like the one below. While the two on the right are faithful to the original art style, the others’ clothing and accessories are only vaguely similar, and their faces and hairstyles weren’t reproduced.

After experimenting with Gemini and ChatGPT, I found it easier to maintain consistency with up to two characters. Though I might have improved results by refining my prompt writing.

ChatGPT 5.2 maintains consistency quite well.

Before ChatGPT 5.2 came out, Nano Banana had already become capable of generating images with fairly consistent results, but ChatGPT 5.2 arrived and it feels like it improved significantly all at once. Here is the group image generated with ChatGPT 5.2.

The original artwork is quite faithful to the source material. While the character on the right, Mr. Shiraishi, is supposed to be more masculine and taller, he’s drawn looking younger here.
Since I didn’t input height settings, the unevenness is unavoidable, but I was surprised this came out in one go. However, I tried to revise Mr. Shiraishi afterward but couldn’t make the corrections.

I realized it can maintain a high level of consistency, and I want to utilize it more going forward.

In 2026, I’ll keep up with the amazing evolution of generative AI while being amazed by it.

Leave a Reply

Your email address will not be published. Required fields are marked *