People are benchmarking AI by having it make balls bounce in rotating shapes

Topics

Latest

Amazon

Image Credits:Andrew Mayne(opens in a new window)

Apps

Biotech & Health

Climate

AI benchmark; yellow ‘ball’ on black background

Image Credits:Andrew Mayne(opens in a new window)

Cloud Computing

Commerce

Crypto

Enterprise

EVs

Fintech

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

video

Partner Content

TechCrunch Brand Studio

Crunchboard

The list of informal , uncanny AI benchmark keeps growing .

Over the past few Day , some in the AI residential district on X havebecomeobsessedwith a trial of how different AI models , specially so - calledreasoning models , address prompt like this : “ Write a Python handwriting for a recoil sensationalistic ball within a material body . Make the shape slowly rotate , and verify that the ball stays within the shape . ”

Some framework manage advantageously on this “ ball in rotating shape ” bench mark than others . Accordingto one user on X , Taiwanese AI lab DeepSeek’sfreely useable R1swept the base with OpenAI’so1 pro mode , which be $ 200 per month as a part ofOpenAI ’s ChatGPT Pro programme .

👀 DeepSeek R1 ( right ) crushed o1 - pro ( left ) 👀

Prompt : “ write a python playscript for a bouncing yellow ballock within a foursquare , ensure to treat hit detection by rights . make the foursquare slowly rotate . implement it in python . make certain formal stays within the square”pic.twitter.com/3Sad9efpeZ

— Ivan Fioravanti ᯅ ( @ivanfioravanti)January 22 , 2025

Peranother X poster , Anthropic’sClaude 3.5 Sonnetand Google’sGemini 1.5 Promodels misjudged the physic , resulting in the testicle get by the shape . Otherusersreported that Google’sGemini 2.0 Flash Thinking Experimental , and even OpenAI ’s olderGPT-4o , aced the valuation in one go .

Tested 9 AI good example on a aperient pretence undertaking : turn out trilateral + bouncing ball . Results :

🥇 Deepseek - R1 🥈 Sonar Huge 🥉 GPT-4o

regretful ? OpenAI o1 : Completely misconceive the task 😂

Video below ↓ First row = Reasoning models , rest = Base models.pic.twitter.com/EOYrHvNazr

— Aadhithya D ( @Aadhithya_D2003)January 22 , 2025

But what does it prove that an AI can or ca n’t code a rotate , nut - containing shape ?

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

Well , simulating a bouncing musket ball is aclassicprogrammingchallenge . Accurate simulations incorporate collision signal detection algorithms , which attempt to distinguish when two objects ( for instance a ball and the side of a shape ) collide . badly compose algorithms can affect the pretense ’s public presentation or direct to obvious cathartic misapprehension .

X userN8 Programs , a researcher in hall at AI inauguration Nous Research , says it take on him roughly two hours to programme a ricochet ball in a rotating heptagon from scratch . “ One has to track multiple co-ordinate system , how the collisions are done in each organization , and plan the code from the beginning to be robust , ” N8 Programs excuse in apost .

But while bounce Lucille Ball and rotating human body are a reasonable trial of programing skills , they ’re not a very empirical AI benchmark . Even little variations in the command prompt can — and do — yield different outcomes . That ’s why some drug user on XTC report possess more luck witho1 , while others say that R1falls shortsighted .

If anything , viral examination like these spot to the intractable trouble of create utile systems of measurement for AI model . It ’s often difficult to tell what differentiates one model from another , outside ofesoteric benchmarksthat are n’t relevant to most masses .

Many movement are afoot to build better tests , like theARC - AGI benchmarkandHumanity ’s Last Exam . We ’ll see how those do — and in the meantime watch GIFs of musket ball bouncing in rotating shapes .

Topics#

More from TechCrunch#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Topics

More from TechCrunch

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI