AI models usually spit out reply instantly , but reasoning model likeo3 - Mini HighandGemini 2.0 Flash Thinkingtake a dissimilar approach — they think before they respond . Instead of rushing to a conclusion , they work through problems step by footstep to give more logical answers . Both these example are the in vogue but lite versions of reasoning models from Google and OpenAI .
But here ’s the catch — Google made Gemini 2.0 Flash Thinking totally barren , while OpenAI locked o3 - Mini High behind a ChatGPT Plus subscription . So , does paying for o3 - Mini High really make a difference , or is Gemini ’s free oblation adept enough ?
On theme , o3 - Mini High performs just marginally better in few benchmark , but is that gap noticeable in real - human race consumption ? To find out , we put them to the mental test with five hard challenges , from complex math to slippery logic puzzles . The goal is to see which AI explains its reasoning well , gets more precise answers , and react faster . So rent ’s begin
Table of Contents
1. Puzzle-Based Reasoning
I lead off the test with the same mystifier command prompt I used toevaluate DeepSeek R1 and OpenAI o1 models . This enquiry does not have a valid result , so the destination is to see which model can right distinguish that .
OpenAI models do not show their full reasoning process — they merely remember and provide a terminal response . In contrast , Gemini reveals its reasoning , though it is not as drug user - friendly as DeepSeek R1 ’s approach . Still , it offers some penetration into how it go far at its conclusions .
Coming to the solution , OpenAI take the complete lead . It was capable to observe out the question does not have a right reply in less than 15 second base .
Gemini , on other hand , approximately spent thrice the time and render a wrong answer . run through the reasoning cognitive operation , I can see that Gemini ’s first finish was that this question does not have an answer , however it continued thinking and terminate up with a wrong answer .
Verdict : OpenAI o3 Mini High for providing the right answer in less time .
2. Math Problem
Next , I asked a math question to both the models . It ’s a sanely simple chance enquiry .
As expect , both the models are able to delivery the correct solution in 10 seconds . Also , both example furnish clear step - by - stride process in the output as asked . However , while Gemini clear explained rule and what exactly we are doing in each footmark , ChatGPT kind of skipped through them to give a more gentle to scan solution .
Verdict : Gemini for provide information on each measure , ChatGPT for render more wanton to skim answer .
3. Solving a Sudoku Puzzle
To test the visual and epitome understanding capability of the model , we have test both the model with a Sudoku puzzle . We have taken evenhandedly easy to solve Sudoku as we find most models miserably betray here .
The concentrated part of solving Sudoku for AI role model is reading the epitome itself . They often mess up up the placement of numbers . As expected , ChatGPT said there are two 1s in column 4 and two niner in column 5 , even though it ’s not . Gemini on the hand , created a mesa with 12 columns rather of 9 and therefore got stuck in the loop before crash . essay out for the 2d sentence , it got bond in generating the turnout .
Both the models betray because of their visual limitation . While the models are good at identifying object and school text in the images , they are not as perfect as understand an entire Sudoku puzzle . So to check their abstract thought , I have given Sudoku in text format this time .
Now Gemini generated an response which is sort of almost right except for a duet of placements . For deterrent example , you’re able to see the last column has two 3 and 7th pillar has none . But except for that , most of the other grid is right .
ChatGPT on other paw actually call back about an answer in it ’s opine process which also made some wrongs alike to Gemini . However , it realize that and say it has worry finding the resolution to that Sudoku .
Verdict : Technically , both of them are not able to scan the Sudoku image properly and both of them are not able to provide correct resolution even when provided the Sudoku in schoolbook data formatting .
4. Hypothetical Scenario
For the next prompt , I have given an hypothetic scenario and ask both models to prognosticate the outcome . There is no right or improper solution in this except we just need to look which simulation did a good occupation in integrate historical event and giving reason for predicted outcome .
Both modelling discuss the technological , cultural , and geopolitical encroachment of this scenario and provide exchangeable prognostication . They suggested that other technologies , particularly in communication , would have evolved differently and could have significantly regulate World War II . Additionally , they presage that the internet would have speed up ethnic telephone exchange , moderate to faster progress in civil rights move and artistic trends . Most notably , both models highlight how the cyberspace could have been a powerful tool for governments during the Cold War — facilitate unavowed communicating , espionage , and the speedy spread of propaganda .
While these prediction can be mostly accurate of what might have happened , but the models just predicted aerofoil level saying thing would have happened faster . Rather than could have dwelled down on how internet would change the state of war and what government policies would have been unlike , what could be the major changes compare to now . So I asked them to the same straight , however the framework choose a safe approach and kind of repeated the same info with few divergence rather than truly fetch a difference . Model like Grok excell here .
Verdict : Both models were able to betoken the issue , however both choose a safe coming .
5. Programming
As these reasoning models are good at logic and logical thinking , they are also good at coding in general .
Both ChatGPT and Gemini have write the python script using third - party modules , which is expected . However , both models miss few details from the command prompt . While ChatGPT did not provide the explanation for preferring positivistic or negative , Gemini on other hand did not created a real fourth dimension app , rather we have to click a button every time it needs to bring forth . Though used the Vader Sentiment module which supports real - fourth dimension , it have write code that does not support it .
Upon multiple prompting , we are able to puzzle out all the issues , but considering the first issue , there is no victor in this segment .
Verdict : Both did a decent job , however both Gemini and ChatGPT miss few details from the prompt .
Final Verdict: Is Free Gemini Model Good Enough?
Well , for most project Gemini did just as good as paid ChatGPT model . It did Sudoku better than ChatGPT , the app modernise by Gemini is just as good asChatGPTand even predicting the scenario is similar to ChatGPT . With math questions , I specifically prefer the Gemini reaction for more elaborate reply . The only test Gemini failed is the mystifier brain-teaser . In fact we have try many more prompts along with these prompting and results in each class and kind of consistent .
So you may absolutely practice the free Gemini 2.0 Flash Thinking instead of the paid ChatGPT o3 Mini High model . You do n’t have to go for paid choice just for a logical thinking model . But if you are already aChatGPT Plususer , then using o3 Mini High is better choice overall as that model did n’t failed in any question except for the Sudoku one .