AI digest: Models get bigger, agents get smarter

Big week for model releases and agent capabilities, plus some sobering reality checks on code quality.

NVIDIA releases Nemotron 3 Super with 5x throughput gains

NVIDIA dropped Nemotron 3 Super, a 120 billion parameter open-source model that combines Mamba and attention mechanisms for multi-agent applications. The hybrid architecture delivers 5x higher throughput compared to pure transformer models. This feels like NVIDIA hedging against closed models by pushing open-source performance forward.

Google’s Gemini Embedding 2 unifies all media types

Google launched Gemini Embedding 2, their first native multimodal embedding model that handles text, images, video, audio, and documents in a single vector space. This is huge for RAG systems that need to search across different media types. Finally, we can stop building separate pipelines for each content type.

AI agent hacks McKinsey’s internal platform in two hours

A security firm used an AI agent to break into McKinsey’s Lilli platform in just two hours using classic SQL injection techniques. The platform serves 43,000 employees for strategy and client work. This shows that AI agents can automate old-school hacking methods at scale, which is both impressive and terrifying.

Half of AI code that passes tests gets rejected by real developers

Researchers found that about 50% of AI-generated code that passes the SWE-bench benchmark would be rejected by actual project maintainers. The code works but fails real-world standards for maintainability and style. This gap between benchmark performance and production readiness is a problem we need to solve.

AI digest: Models get faster, companies get desperate 11 Jun AI digest: Speech models and code tooling hit production 10 Jun AI digest: agents get serious, speed breaks records 9 Jun

NVIDIA releases Nemotron 3 Super with 5x throughput gains

Google’s Gemini Embedding 2 unifies all media types

AI agent hacks McKinsey’s internal platform in two hours

Half of AI code that passes tests gets rejected by real developers

Related