I created this ‘Action Figure’ trend, and I have some thoughts.
Friday 11th April 2025
Imagine walking into a toy store and spotting yourself - boxed up, glossy-eyed, and surrounded by your daily essentials. That’s the latest AI image generation trend making waves across social media. People posting pictures of themselves, or famous people as Action Figures created using ChatGPT.
OpenAI’s image capabilities using DALL·E has always felt good but not great compared to other image generator models. MidJourney has always been a front-runner, and for my own projects, I’ve steered towards Leonardo.ai. However, with the recent update to their Model, OpenAI’s image generation capabilities have now become immensely powerful and positioned itself as a strong contender in the image generating space.
Every generation always starts with a good Prompt and I opted to put more detail in my prompt to produce an incredibly specific result.
Make the packaging look as realistic as possible - shiny plastic, a top hook for hanging, and toy-store-style design.
The figure should have dark brown hair, and be wearing a purple suit jacket and a white top, with purple suit jeans, and smart black shoes. They should be wearing purple glasses.
Place accessories next to the figure that reflect its style and image:
This should include:
- A PowerPoint clicker
- A laptop
- A microphone
- A rucksack
On the box:
At the top, write in large letters: Tommy Hills
Below that - description: Performer
Make the image as realistic as possible — as if it's a real toy you'd find in a store.
There are a few key choices in the prompt that made a noticeable difference to the result:
- Reinforcing the idea of photo-realism by mentioning it several times throughout the prompt, reducing the margin of misunderstanding for the model.
- There is very little information for the model to fill-in or 'imagine'. Detailed instructions about the packaging gives the model less wiggle room to improvise.
The other requirement? Supplying the model with reference images.
I snapped a quick portrait and added several shots from my performing portfolio. I uploaded these images directly into ChatGPT with the prompt, using the image upload feature on desktop. This gave the model a solid anchor when shaping the action figure’s face - at least in theory.
And then it was time to generate. It’s at this point I noticed how long the model takes to produce its image. It took around 5 minutes to generate which is bar far the longest that any image model I’ve tried has taken. But here were the results:
Successes
- It stuck to the outline of the prompt incredibly well. The purple suit, white shirt and black shoes all translated perfectly into the result.
- The objects are clear and resemble their real-world counterparts. I am especially impressed with the PowerPoint clicker which, by coincidence, resembles my real-life clicker.
- The text is clear, concise, and doesn’t have any image Hallucinations or glitches. Image models generating text is a relatively new feature and this model has very successfully put the text onto the box.
Struggles
- The face doesn’t match my face. It’s similar in a lot of ways. It managed to get the roundness of my glasses right, and my hair, but side by side with real-life, it just doesn’t quite match. It’s hard to identify what is wrong with it, but it feels a little strange to look at. Like some bizarro-version of me.
- The bottom of the box and the feet is very odd and inconsistent. In all three generations the person goes beyond the edge of the action figure case and cuts off the figures feet. It’s unclear what causes this glitch but shows that the image generation still needs a bit of refinement. Looking at other people’s generations, it doesn’t seem like this glitch happens all the time, so I would love to know why this glitch was happening.
Oddities
Now this was a really strange part of the generations. In the prompt, I outlined four items A laptop, a microphone, a PowerPoint clicker and a rucksack and all three generations five objects were added into my action figure. In the first two generations, I have two rucksacks and in the second I have two laptops. I question why it is sticking so rigidly to having five objects, when I have only asked for four. Perhaps there is some ‘template’ or reference in the dataset it is using that requires five objects?
End
So here’s my take on this trend and my thoughts about it. The improvements that are being made to image generators are making it really easy to get a fantastic result.
I’ve already started to implement using OpenAI’s image generation into my own projects with fantastic results. I’m excited to see what the next AI-trend is going to be. My personal hope is for a PopVinyl’s trend!