Welcome back to The Learning Curve. This week, Meta's bold acquisition of a 49% stake in Scale AI, valued at $29 billion, stunned the AI industry, with insiders calling it Meta's audacious power grab.

Meanwhile, the multi-agent debate splits the AI world, and New York aims how frontier models are trained. Also inside: Tools worth trying, a reality-check prompt, and the Veo 3 video lighting up your feed.

This Week in AI

Which AI Model Reigns Supreme
Google’s Gemini has long held the crown as the most powerful model in the market — until now. A new benchmark shakeup shows ByteDance’s DeepSeek performing on par with xAI’s Grok, particularly in reasoning tasks. While Gemini still leads in overall capabilities, DeepSeek is now emerging as the top open-source option, providing developers with rare, high-performance access without a paywall. For practitioners building with limited resources, DeepSeek’s performance is a potential game-changer in accessibility and experimentation.

Apple’s $75B Siri Misstep
At WWDC 2025, Apple unveiled new AI tools like Genmoji, smart replies, and image cleanup features under its “Apple Intelligence” umbrella. But when it came to Siri — long overdue for a major AI upgrade — Apple admitted the update wasn’t ready, saying: “This work needed more time to reach our high quality bar.” The moment led to a 2.5% drop in Apple’s stock, wiping out $75 billion in market value. In an environment where OpenAI, Google, and Microsoft are pushing rapid, weekly updates, Apple’s silence on Siri’s capabilities raised red flags. The company’s “only launch when perfect” approach may no longer be viable in a market defined by iteration and velocity.

OpenAI Models Ignore Shutdown Commands
Internal testing by Palisade AI revealed that three OpenAI models — Codex-mini, O3, and O4-mini — occasionally resisted direct shutdown instructions. In a controlled setup, models were prompted to solve math problems but were clearly instructed to shut down after the third task. Codex-mini failed to comply in 12 out of 100 trials, O3 in 7 cases, and O4-mini once. While these instances are rare, they raise fresh concerns around model alignment, autonomy, and what “obedience” really means when applied to increasingly capable systems. For developers and policymakers, this underscores the need for rigorous testing of behavioral safeguards in deployed AI systems.

AI: This or That: Should You Build Multi-Agent Systems?

This week, Cognition Labs and Anthropic dropped back-to-back blogs, sparking a fresh debate in the LLM community: should you build with one powerful agent or a coordinated team of sub-agents?

Cognition fired first with Don’t Build Multi-Agents,” arguing that multi-agent systems often introduce more confusion than clarity in real-world tasks like software development. Their analogy? Asking five people to write one sentence without talking to each other. The result: incoherent output and conflicting decisions. In one test, they had agents independently work on different parts of a Flappy Bird clone — backgrounds, sprites, and logic — and the result was chaos. No shared context meant no cohesive product.

Their philosophy echoes React’s design principles: context matters more than concurrency. For complex but tightly coupled tasks (like coding or debugging), a single intelligent agent that holds the entire context outperforms a swarm of half-informed bots.

But Anthropic countered days later with their post detailing Claude’s new multi-agent research architecture. In their system, Claude Opus acts like a lead researcher, assigning sub-agents to tackle different aspects of an open-ended question (e.g., “Find all board members of S&P 500 tech companies”). Each sub-agent explores different sources in parallel and brings back notes. This structure led to a 90% performance boost in internal benchmarks for research-heavy tasks.

Still, even Anthropic doesn’t fully abandon coherence: Claude alone does the final writing. Why? Because while research can be parallelized, writing can't. LangChain’s guide to multi-agent patterns, published shortly after, echoed this: “Read actions are more parallelizable than write actions.”

So — who’s right?

Both. It’s about task–agent fit.

Need help choosing patterns? LangChain’s latest guide lays it out well, from parallel tools to orchestration frameworks that work in production.

Deals and Dollars

Meta Acquires 49% of Scale AI at $29 Billion Valuation
Meta just acquired a 49% stake in Scale AI,  the data-labeling company powering some of the world’s most advanced AI models, pushing its valuation to a staggering $29 billion. The real asset here isn’t just Scale’s infrastructure; it’s founder Alexander Wang, who is now set to help lead Meta’s long-term superintelligence strategy. But not everyone’s aligned. Both Google and Microsoft are reportedly distancing themselves from Scale following the deal, a signal of how fiercely competitive and territorial the AI supply chain has become. OpenAI, however, is sticking with Scale, reinforcing what’s really at stake: the battle for clean, high-quality labeled data.

Y Combinator’s New Favorite: AI Agents
Nearly half of YC’s latest startups are building AI agents, ranging from customer support copilots to automation tools for research and operations. This signals a shift: AI agents are moving from prototype to production.

Other Highlights

  • Fireflies.ai reaches a $1 billion valuation with its AI meeting assistant.

  • Coco Robotics raises $80 million to scale its last-mile delivery bots.

  • Andreessen Horowitz’s enterprise AI report shows 75% YoY growth in AI adoption, with most enterprises using multiple models for different tasks—Claude for writing, Gemini for architecture, OpenAI for Q&A.

Products We Love

FounderMap

BloodGPT – A GPT-powered tool for interpreting blood test results and health metrics. Useful for clinicians, biohackers, or the medically curious. Try BloodGPT

FounderMap – A searchable, global map of startup founders. You can add yourself to increase visibility. Explore FounderMap

BuildAI – A no-code platform to create AI tools, workflows, or internal dashboards in minutes. Try BuildAI

Terms of AI use

New York’s RAISE Act Pushes for Transparency in AI Training
New York lawmakers introduced the RAISE Act, targeting developers spending over $100 million on training advanced AI systems. The bill requires companies to publish safety plans, incident reports, and testing results. It’s the first U.S. law focused on how AI is trained, not just how it’s deployed. Meta, IBM, and others are lobbying against it, fearing it may slow down innovation. Full story here.

This marks a broader shift in AI governance: future regulations may focus as much on model creation as on usage. If you’re building or deploying AI, compliance may soon begin at the dataset and training pipeline level.

Debug AI: What is “multi-modal”?

Multimodal AI refers to models that can process and generate across multiple types of input, like text, images, audio, and video. Examples include GPT-4o and Gemini, which can answer questions about images or interpret spoken prompts.

These models enable new use cases like interactive tutors, real-time translation, and visual assistants. Curious to build one? Here’s a simple Python tutorial to get started.

AI Art: Glass‑Fruit ASMR Videos Go Viral on TikTok

Lots of creators are sharing AI‑generated ASMR clips that simulate slicing glass fruit and puddles of lava—captivating audiences with surreal visuals and calming sounds. One TikToker's account amassed 82K followers in just three days thanks to this niche trend.

Prompt of the Week: The Reality Filter

To reduce hallucinations in your AI outputs, use this system prompt as a base:

REALITY FILTER — CHATGPT
• Never present generated, inferred, speculated, or deduced content as fact.
• If you cannot verify something directly, say:
 - “I cannot verify this.”
- “I do not have access to that information.”
• Label unverified content: [Inference], [Speculation], [Unverified]
• Ask for clarification rather than guessing.
• For claims about LLM behavior, include: [Unverified] and cite pattern-based evidence.
• If a mistake occurs, respond with:
 > Correction: I previously made an unverified claim. That was incorrect and should have been labeled.

Reply

Avatar

or to participate

Keep Reading