If you ’ve been following AI latterly , you get it on reasoning is the next big thing . AI models are n’t just discharge sentences any longer — they ’re solving problems , making decisions , and thinking through complex scenarios . Now , Google ’s raw Gemini 2.5 Pro has entered the stadium , claiming to outthink every other reasoning AI framework out there .
So , what ’s new in Gemini 2.5 Pro ? And how does it really stack up against other top role model likeOpenAI ’s o3 Mini , DeepSeek R1 , andGrok 3 Thinking?I test them all using real - populace command prompt .
What is Gemini 2.5 Pro?
Google just reveal its most hefty AI model — Gemini 2.5 Pro . It ’s a reasoning model , so it can solve complex trouble by thinking step - by - step to reach a logical resolution . It understands multimodal inputs — like school text , image , audio , and video . It ’s currently available to Gemini sophisticated substance abuser and destitute to examine in Google ’s AI Studio .
Gemini 2.5 grade 18.8 % on Humanity ’s Last Exam — the highest among all reasoning models without using dick or lookup . HLE is a rigorous benchmark designed to assess AI manakin ’ expert - level reasoning across various subjects . For context , o3 Mini tally 14 % , andDeepSeek R1scored 8.6 % .
Gemini 2.5 Pro also beat others in multiple benchmarks and claim to be much better at reasoning and coding . In LMArena , where users vote for the right answer , Gemini 2.5 Pro topped the chart with a musical score of 1,443 — high-pitched than any other AI manakin out there . The only model that beat it in one trial wasChatGPT ’s Deep Researchmodel with 26.6 % , but that is n’t a abstract thought mannikin .
Gemini
Here ’s what you need to get frantic about in Gemini 2.5 Pro :
As you could see , the simulation ’s major advantage is code — especially where system of logic and multimodal understanding are involved . So , let ’s see how it performs in real - Earth tests equate to other democratic logical thinking models out there .
Gemini 2.5 Pro vs Other AI Reasoning Models
Since the manakin is strong in multimodal understanding and coding , I started by testing those areas .
1. Rubik’s Cube Simulation (Code Test)
First , I provide a elaborated prompt to produce a Rubik ’s Cube simulation with scramble and solve alternative . I ask for it in p5.js without HTML and lean all the features , mathematical function , and technical tools needed to make the animations .
To my surprise , Gemini delivered . While the solve option is n’t working perfectly , I was able to manually rotate the square block and apply the scramble alternative successfully .
I also test it with other models , but none of them save proper result . To be outspoken , Gemini 2.5 Pro is the first model to get the simulations and demo rightfulness . just put , this was n’t possible with any other AI model before .
2. Logic Puzzle (Reasoning Test)
We also quiz some reasoning - based prompt . Here ’s one . This interrogation does n’t have a definitive answer :
Let ’s see which example can work out out that this is a paradox . Gemini take just 24 second to identify that it ’s a self-contradictory situation with no vindicated answer . OpenAI ’s o3 Mini and Grok both took around 40 seconds and predicted the right answer . DeepSeek R1 , however , took 434 second gear and receive it ill-timed the first fourth dimension — though it did get it veracious when asked again .
This is n’t just a one - off sheath . DeepSeek incline to stumble on more complex questions . That said , the overall remainder is n’t huge , as most mannequin correctly predicted the answers using logic in most compositor’s case .
Gemini
3. Physics Problem (Math Test)
Next , I tested all the models with some math tryout . o3 Mini has lead the math until now , however , Gemini 2.5 Pro score in effect benchmarks . Here is one example of all .
All models solve this accurately and bring home the bacon clear , step - by - step explanations . While Gemini lead in math benchmarks , the literal operation gap is minimal — all models handled most problems well .
Gemini 2.5 Pro
Gemini 2.5 Pro is a monolithic melioration over 2.0 Flash Thinking . However , it ’s more or less on the same level as model like o3 Mini , Grok 3,or DeepSeek R1 . That said , when it comes to multimodal understanding , this model last delivers much good consequence . aside from that , we can now say that Gemini has formally joined the grade of other models when it comes to reasoning .
Samsung Galaxy S25 FE Could Use This New Chipset (It’s…
iPhone To Ditch Flat Screen? New Leak Hints At Curved…
One UI 8 For Watch To Bring Now Bar And…
New Spotify Update Makes It Easier to Discover New Music:…
It’s Official: Samsung Galaxy S25 Edge Launch Date Out!
Samsung Galaxy Z Flip 7 FE Price Leaks But It…
10000mAh Battery?! Realme’s New Phone Aims to Replace Your Powerbank
Samsung Galaxy S26: Exynos Set to Make a Comeback
Goodbye Dynamic Island? Apple iPhone 18 Set to Introduce This…
Leaked! Samsung S25 Edge Price, Specs, Launch Date
Gemini
ChatGPT
DeepSeek
Gemini
ChatGPT
Grok