If you ’ve been following AI latterly , you get it on reasoning is the next big thing . AI models are n’t just discharge sentences any longer — they ’re solving problems , making decisions , and thinking through complex scenarios . Now , Google ’s raw Gemini 2.5 Pro has entered the stadium , claiming to outthink every other reasoning AI framework out there .

So , what ’s new in Gemini 2.5 Pro ? And how does it really stack up against other top role model likeOpenAI ’s o3 Mini , DeepSeek R1 , andGrok 3 Thinking?I test them all using real - populace command prompt .

What is Gemini 2.5 Pro?

Google just reveal its most hefty AI model — Gemini 2.5 Pro . It ’s a reasoning model , so it can solve complex trouble by thinking step - by - step to reach a logical resolution . It understands multimodal inputs — like school text , image , audio , and video . It ’s currently available to Gemini sophisticated substance abuser and destitute to examine in Google ’s AI Studio .

Gemini 2.5 grade 18.8 % on Humanity ’s Last Exam — the highest among all reasoning models without using dick or lookup . HLE is a rigorous benchmark designed to assess AI manakin ’ expert - level reasoning across various subjects . For context , o3 Mini tally 14 % , andDeepSeek R1scored 8.6 % .

Gemini 2.5 Pro also beat others in multiple benchmarks and claim to be much better at reasoning and coding . In LMArena , where users vote for the right answer , Gemini 2.5 Pro topped the chart with a musical score of 1,443 — high-pitched than any other AI manakin out there . The only model that beat it in one trial wasChatGPT ’s Deep Researchmodel with 26.6 % , but that is n’t a abstract thought mannikin .

Article image

Gemini

Here ’s what you need to get frantic about in Gemini 2.5 Pro :

As you could see , the simulation ’s major advantage is code — especially where system of logic and multimodal understanding are involved . So , let ’s see how it performs in real - Earth tests equate to other democratic logical thinking models out there .

Gemini 2.5 Pro vs Other AI Reasoning Models

Since the manakin is strong in multimodal understanding and coding , I started by testing those areas .

1. Rubik’s Cube Simulation (Code Test)

First , I provide a elaborated prompt to produce a Rubik ’s Cube simulation with scramble and solve alternative . I ask for it in p5.js without HTML and lean all the features , mathematical function , and technical tools needed to make the animations .

To my surprise , Gemini delivered . While the solve option is n’t working perfectly , I was able to manually rotate the square block and apply the scramble alternative successfully .

I also test it with other models , but none of them save proper result . To be outspoken , Gemini 2.5 Pro is the first model to get the simulations and demo rightfulness . just put , this was n’t possible with any other AI model before .

Article image

2. Logic Puzzle (Reasoning Test)

We also quiz some reasoning - based prompt . Here ’s one . This interrogation does n’t have a definitive answer :

Let ’s see which example can work out out that this is a paradox . Gemini take just 24 second to identify that it ’s a self-contradictory situation with no vindicated answer . OpenAI ’s o3 Mini and Grok both took around 40 seconds and predicted the right answer . DeepSeek R1 , however , took 434 second gear and receive it ill-timed the first fourth dimension — though it did get it veracious when asked again .

This is n’t just a one - off sheath . DeepSeek incline to stumble on more complex questions . That said , the overall remainder is n’t huge , as most mannequin correctly predicted the answers using logic in most compositor’s case .

Article image

Gemini

3. Physics Problem (Math Test)

Next , I tested all the models with some math tryout . o3 Mini has lead the math until now , however , Gemini 2.5 Pro score in effect benchmarks . Here is one example of all .

All models solve this accurately and bring home the bacon clear , step - by - step explanations . While Gemini lead in math benchmarks , the literal operation gap is minimal — all models handled most problems well .

Gemini 2.5 Pro

Gemini 2.5 Pro is a monolithic melioration over 2.0 Flash Thinking . However , it ’s more or less on the same level as model like o3 Mini , Grok 3,or DeepSeek R1 . That said , when it comes to multimodal understanding , this model last delivers much good consequence . aside from that , we can now say that Gemini has formally joined the grade of other models when it comes to reasoning .

Samsung Galaxy S25 FE Could Use This New Chipset (It’s…

iPhone To Ditch Flat Screen? New Leak Hints At Curved…

One UI 8 For Watch To Bring Now Bar And…

New Spotify Update Makes It Easier to Discover New Music:…

It’s Official: Samsung Galaxy S25 Edge Launch Date Out!

Samsung Galaxy Z Flip 7 FE Price Leaks But It…

10000mAh Battery?! Realme’s New Phone Aims to Replace Your Powerbank

Samsung Galaxy S26: Exynos Set to Make a Comeback

Goodbye Dynamic Island? Apple iPhone 18 Set to Introduce This…

Leaked! Samsung S25 Edge Price, Specs, Launch Date

Article image

Gemini

Article image

ChatGPT

Article image

DeepSeek

Article image

Gemini

Article image

ChatGPT

Article image

Grok