No "Peak AI" In Sight - OpenAI's ChatGPT o1 pro mode Is Another Quantum Leap
I bought a ChatGPT Pro subscription for $200 a month to get access to OpenAI's latest o1 pro mode and compared it to previous models. The results surprised me.
I am in no way affiliated with OpenAI, Inc. and have received no benefit or payment for this article.
In September I tested OpenAI's o1-preview with the task of simulating an ant colony in Python/Pygame and was thrilled by the result. Asking an AI model to simulate and visualize an ant colony is certainly not a scientifically sound way to evaluate its performance, but on the other hand, such a simulation is certainly not a standard task with an infinite amount of existing boilerplate code and requires some transfer. Besides, you don't need to be a programmer to judge the result, as you can see how the ants move and collect the sugar.
Now I have purchased ChatGPT Pro for $200 a month and asked the AI models 4o, o1 mini, o1 and o1 pro mode to write an ant simulation by prompting:
Create a simulation of an ant colony foraging for sugar using separate HTML, CSS, and JavaScript files. The ant nest should be positioned in the center of the screen, with 25 sugar piles and 100 ants distributed randomly. Each sugar pile should contain a discrete number of sugar units, approximately normally distributed with a median of around 10 units, and its visual size (radius) should reflect the quantity of sugar it holds. The ants should move in a random, frequently changing manner. If an ant reaches the edge of the screen, it should simply bounce back by changing direction. When an ant encounters sugar, it should pick up one unit, carry it directly to the nest, and then return to roaming randomly. While carrying sugar, the ant should emit a scent trail that gradually fades over time, and this trail should be displayed in gray. Other ants that detect this scent trail should follow it back to the sugar source. After an ant delivers sugar to the nest, it should temporarily ignore scent trails to prevent ants from gathering near the nest where trails converge. Each time a sugar unit is removed from a pile, the pile should visibly shrink to reflect the reduced amount of sugar remaining. Take care to use distinct, appealing visual representations for the ants, the nest, and the sugar piles, and ensure that ants never become stuck and that the simulation runs smoothly.
Here are the ant simulations that the tested versions of ChatGPT generated directly at my prompt, without any further feedback, corrections, or hints:
ChatGPT 4o
In principle, the ant simulation written by ChatGPT 4o meets all the requirements. However, the ants move quite erratically, and after a while they end up in areas of the screen where there is no sugar at all. Overall, the ants simulated in this way are quite inefficient, so that even after several minutes only some of the sugar has been collected.
ChatGPT o1-mini
ChatGPT o1-mini also implements my instructions well, but the ants gather around the nest relatively quickly because all the scent trails meet here, although I had anticipated this case and even suggested in my prompt how to solve it (the ants should ignore the scent trails for a short time after they have delivered the sugar). Another problem is that the ants often follow the scent trails back to the nest instead of to the sugar piles, and then get stuck there without ever having reached the sugar. Perhaps my prompt here was not clear enough. Overall, the ants in this simulation are also very inefficient at collecting sugar.
ChatGPT o1
In ChatGPT o1's ant simulation, the same problems occur as in o1-mini's version: the ants gather around the scent trails, which slowly disappear until all the ants are concentrated in one place. When the last trail disappears, all the ants stream out and collect sugar. Compared to the simulation written by o1-mini, the ants seem to collect the sugar more efficiently.
ChatGPT o1 pro mode
The simulation written by ChatGPT o1 pro mode implements my instructions very precisely and the amazing result is exactly what I had in mind. The ants do not crowd anywhere and collect the sugar extremely efficiently. After a short time, real paths are formed for the ants to take the sugar back to their nest.
Verdict: OpenAI has once again delivered a quantum leap with ChatGPT o1 pro mode
In my experiment, I didn't give ChatGPT any feedback or the opportunity to optimize and correct. From my own experience I know that ChatGPT can achieve really great things after longer dialogues. Nevertheless, I think that the quality of the direct output to a prompt already allows a statement about the performance of an AI model. All 4 versions of ChatGPT tested did an excellent job of following my instructions, producing programs that would keep even an experienced programmer busy for at least a few hours. To my surprise, however, the o1 pro mode beat the other versions hands down and implemented all my instructions incredibly well on the first try.
Let's face it, $200 a month is a lot of money and I was very reluctant to spend it on yet another supposedly better version of ChatGPT. In the end, however, curiosity won out and I really wanted to try o1 pro mode for at least a month. Also, because o1 pro mode wasn't presented as a new model, but as a more advanced mode of o1, I didn't expect pro mode to be such a quantum leap, and I'm absolutely thrilled.
Already in early 2023, a few months after the release of the first public version of ChatGPT, critical voices warned of a "peak AI" of LLMs. Now, at the end of 2024, OpenAI shows once again how much potential LLMs still have - I am tempted to keep my ChatGPT Pro subscription.
Follow me on X for frequent updates (@chaotropy).