The newfangled ChatGPT o1 manikin from OpenAI focuses on reason to solve rugged coding and mathematics problem — sphere where late OpenAI exemplar clamber . OpenAI claims o1 model ( also called Strawberry ) is design to spend more metre suppose before it reply . In this clause , we explore what the new o1 manikin crack , how it can be utilitarian for us and most significantly , how it compares with other top - tier up manikin like GPT-4o , Gemini 1.5 Pro , and Claude 3.5 Sonnet . Let ’s begin .
tabular array of cognitive content
What is OpenAI o1 Model
Until now , OpenAI ’s language models have been part of the GPT series , such as GPT-3.5 , GPT-4 , and GPT-4o . The fresh o1 model marks the beginning of a novel “ o ” serial , designed to enhance logical thinking and complex cerebration before bring forth a answer . Unlike premature good example , o1 uses a “ chain of persuasion ” approach , internally breaking down problems whole step - by - step to provide more precise answer to much more complex problems . OpenAI give PhD students as a aim userbase .
Here ’s a graph OpenAI share compare o1 Strawberry model with their former GPT4o model where the former was asked PhD level science questions .
Complex problem requires multiple steps . As the turn of steps increased , previous models produced inaccurate solvent unless drug user guide them through each step with a series of prompting . In dividing line , the o1 model claims to treat this mountain chain of cerebration on its own , as if it is engaging in an internal dialogue to arrive at the right result .
However , because it expend more time processing and intellection , Strawberry is much dense than others . In many cases , it does n’t even get down answering prompts while good example like GPT-4o already finish their response .
Highlights of OpenAI o1 Model
reason
Being better at reasoning and complex tasks hit the new o1 modelling good at maths , skill , secret writing and several other high - level advanced undertaking . OpenAI tested these model along with GPT 4o on a divers readiness of exams and ML benchmarks like Math , Code , and Science .
Where GPT 4o was only able to solve with 13 % accuracy , the new o1 model was able to solve with 83 % accuracy and o1 - preview has around 56 % truth .
Chain of Thought
The o1 modeling uses a chain of thought advance . you could review the total thought process by clicking on the “ Thought ” option at the top . Although you may not see the specific inputs that led to the thought , you may view the focus of the abstract thought and what ChatGPT count before respond .
How to Access ChatGPT o1 Model
The newfangled o1 model card include OpenAI o1 , OpenAI o1 - preview , and OpenAI o1 - miniskirt . go today , the preview and mini model are available to paidChatGPT Plus users , with usance demarcation line of 30 substance per week for o1 - preview and 50 messages per week for o1 - mini .
To make the most of these models , use them only when necessary . To enter the o1 - preview and o1 - miniskirt model , undecided ChatGPT , pat on the model number above , and select either the o1 - prevue or o1 - mini alternative to begin using them .
Comparing ChatGPT o1 With GPT 4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro
Since the ChatGPT o1 theoretical account is focused on math and cryptography , we tested its performance in real - world scenario against other language models , including GPT-4o , Claude 3.5 Sonnet , and Gemini 1.5 Pro .
1. Math Question
I began the test by giving this math question to all the AI mannequin .
The yield leave by the o1 model is more elaborated and also correct , as shown below .
GPT-4o did not regard the didactics to debar touching or crossing the sloping point , which caused it to develop wrong solvent .
However , when I broke down the steps , GPT-4o was able to generate the right answer . astonishingly , Gemini 1.5 Proproduced an yield that was hard to understand . It inexplicably brought Python into the word , even though the question did not mention it at all .
However , GPT-4o , Gemini 1.5 Pro , and Claude 3.5 Sonnet all managed to solve the problem aright when I manually lead them through the stairs .
2. Coding Question
When it comes to coding , I ran multiple tests that I ’m intimate with , and all the model do similarly . Here is one of the example I try :
All modeling , not just ChatGPT o1 , have provided the correct computer code . In fact , we seek the example provided by OpenAI on their site , and the results were similar . GPT-4o generally struggles with UI - base coding , and this is also the pillow slip with ChatGPT o1 . When it come to front - end evolution , Claude 3.5 Sonnet takes the top spot . However , all models perform likewise when it fall to back - goal and logic - based coding .
However , when faced with unique problems , ChatGPT o1 might outperform the other good example — something we have yet to observe .
ChatGPT o1 Model – How it is Useful in Real-World
ChatGPT o1 is particularly effective at undertaking that require forward-looking reasoning , such as Ph.D. - floor math , science , and secret writing , which may not be relevant for everyday use or regular phratry . However , if you are search for help with business provision , managing finances , or programming — tasks that require solid logical thinking and decision - make skills — we have notice that the ChatGPT o1 modelling performs exceptionally well compare to other simulation . Additionally , since it is include with the ChatGPT Plus subscription at no extra monetary value , it offers add value to Plus users .