GPT-5 vs Opus 4.1

Summer Frontier models update

Aug 08, 2025

Hi Folks,

I hope you are spending some good time off, Motorbike season is raving for me but I got major update worth to talk about:

GPT-OSS - August, 5
Claude Opus 4.1 - August, 5
GPT5 - August, 7

GPT-OSS

Open AI release their first open-weight model (understand available for downloading, fine-tuning and local inference)

Without a surprise it’s stand on the top of the chart battling with the Chinese dominance in that category along Qwen & DeepSeek.

Despite being a good model it barely broke the news because few people cares about it as you need $25000 (2x H100) GPU to run it.

Claude Opus 4.1

Anthropic release an update of their model, Opus 4.1, it does not look big like that but when you know the difference between Sonnet and Opus, at this level each percentage is very hard to gain.

I’ve use it partially yesterday and did not see a difference but I was not doing some complex or major task either.

GPT-5 release

OpenAI presentation for GPT-5 was clearly made for muggle (understand non-tech people).

Because the whole presentation was cringe as fuck to watch from an experienced developer.

They rehearse how complex is to bootstrap a react application that need a week of experienced developer vs couple of hours with GPT-5. And this is where I was annoyed, it’s been months we already are past the “boilerplate” phase for us as developer, Claude Code and Cursor made big progression past months to provide better experience for day-to-day work that actually matter.

The whole presentation continue with single-shot prompt generation of useless app.

This is where I clearly understood that “yep OpenAI is more than ever a company for consumer” where your aunt will be able to generate her own kitchen book recipes app or candy crush game.

Perfect score 100% on AIME25: Basically, a math genius
HealthBench: Acts more like a thinking partner than a search engine
Tops the charts on MMMU Pro: Sees, understands, and reasons about everything (Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark)

My Own Experience

All those presentation are funny and all but I decided to test it myself. I provide to GPT-5 and Opus 4.1 the same single shot prompt for a small educational webapp presenting how Internet works.

Surprisingly, I prefer the Claude version, it more fun with animation and educational with the Fun Fact. See it by yourself!

Claude: https://claude.ai/public/artifacts/c1036318-ab7f-4bbd-b2c5-4073a91a0af7
GPT: https://chatgpt.com/canvas/shared/6895c3d44c488191aeb9e5fdf3862104

Now I will test it for real with Cursor developing features for notionvibe.com And let you know in coming weeks my feedback on it.

Theo Experience

Theo is a famous tech Youtuber, and was given beta access to GPT-5 with bunch of others SF Tech people. He’s observation are that GPT-5 is very good at making thing you demand. Like more profound and focus to the task goal, less intent to make hallucination or mistake.

20 min video here:

Community feedback

Hacker news thread feedback:

https://news.ycombinator.com/item?id=44827101

“Mixed but largely optimistic feedback, positioning it as a strong contender against Anthropic's Claude 3 Opus while fueling speculation about upcoming Claude 4 or 4.1 iterations. Users praised GPT-5 for its superior multi-step reasoning, faster inference times (often under 5 seconds for complex queries), and innovative code generation—such as refactoring intricate algorithms in Python or Rust with fewer bugs and higher efficiency than Claude's outputs. However, Claude 3 Opus earned kudos for its reliability in ethical reasoning, factual accuracy, and polished creative content, with some preferring its "safer" approach to avoid hallucinations in high-stakes tasks like legal analysis or secure coding. While GPT-5 shone in productivity benchmarks (e.g., informal scores around 92% on HumanEval vs. Claude's 88%), the community noted its occasional verbosity and higher costs, advising engineers to test both for specific workflows—especially as rumors of Claude's speedier upgrades could soon level the playing field in this rapidly evolving AI landscape.” — By Grok4

As always, stay curious, build and ship.

Have fun nerds

— Pierre

PS: This newsletter has been “single-shot” by my bare hands without corrector or AI (Expect for HN summary) Was is bad or better?

Agentic dev 3.o

Discussion about this post