Anthropic's 81,000-Person Study Reveals a Perception Gap, Robotics Explodes on Hugging Face, and Mistral Bets Against Fine-Tuning
The largest qualitative AI study ever conducted shows people think they are collaborating with AI when they are actually delegating to it. Meanwhile, robotics datasets grew 23x in a year to become Hugging Face's fastest-growing category, and Mistral launched Forge to let enterprises train models from scratch.
Jeff Brook
AI Researcher — Founder, AI Daily News
Anthropic just published the largest qualitative AI study ever conducted, and the most striking finding is not about what people want from AI — it is about how badly they misjudge their own relationship with it. Nearly 81,000 people sat down for ten-minute AI-conducted interviews. The gap between what they reported doing and what they actually do should recalibrate how you think about AI adoption in your own organisation.
What did 81,000 people reveal about how they actually use AI?
Anthropic Interviewer: The largest qualitative AI study — Anthropic
Anthropic built a tool called Anthropic Interviewer — an AI system that conducts adaptive, ten-to-fifteen-minute research interviews at scale — and deployed it to study 1,250 professionals across the general workforce, scientists, and creatives. The methodology is novel: Claude generates interview rubrics, conducts real-time adaptive conversations, then human researchers collaborate with Claude to extract themes.
The headline numbers are useful. Among the general workforce, 86% report AI saves time and 65% are satisfied with AI's role. Among creatives, 97% report time savings and 68% cite quality improvements. Scientists are more cautious — 79% cite trust and reliability as primary barriers, and they primarily use AI for manuscript writing and code debugging rather than core research.
But the finding that should stick with you is the perception gap. When asked to characterise their AI use, participants described a 65/35 split between augmentation and automation — they believed they were mostly collaborating. Actual Claude usage patterns showed a near-even 47/49 split. People perceive their AI interactions as more collaborative than they functionally are.
This matters for deployment strategy. If your team thinks they are steering AI when they are actually delegating to it, your quality controls are calibrated wrong. The 69% who report social stigma around AI use at work are likely underreporting their actual dependence, which means your organisation's real AI exposure is higher than your survey data suggests.
What to do: Run a usage audit, not a sentiment survey. Instrument actual AI interaction patterns — what percentage of outputs are used verbatim versus edited substantially? The perception gap means self-reporting will systematically understate automation and overstate collaboration.
Why is robotics the fastest-growing AI category nobody is talking about?
State of Open Source on Hugging Face: Spring 2026 — Hugging Face
Hugging Face's spring ecosystem report landed with headline numbers — 2 million models, 13 million users, Chinese models at 41% of downloads — but the most actionable signal is buried in the sector trends. Robotics datasets grew from 1,145 to 26,991 in a single year, a 23x increase that vaulted robotics from the 44th most active category to first, surpassing text generation. The RoboMIND dataset alone contains 107,000 real-world trajectories across 479 tasks. LeRobot's GitHub stars nearly tripled.
This is not gradual growth. This is a phase transition. The same pattern that preceded the LLM explosion — a critical mass of open training data attracting a flood of practitioners — is now happening in embodied AI.
The broader ecosystem shifts reinforce this. Individual developers now account for 39% of model uploads, up from 17% in 2022, while corporate labs dropped from 70% to 37%. Alibaba's Qwen family has generated over 200,000 derivative models, more than Google and Meta combined. The median deployed model is still just 406M parameters — practitioners continue to vote for small and efficient over large and expensive.
Model engagement peaks immediately after release and declines within roughly six weeks. If you are maintaining an open model, continuous iteration is not optional — DeepSeek's sustained dominance comes from rapid release cadence, not from any single checkpoint.
What to do: If you are building anything adjacent to physical AI — manipulation, navigation, sim-to-real transfer — the open robotics data ecosystem has crossed the critical mass threshold. Evaluate RoboMIND and LeRobot for your use case. If you are purely in language and reasoning, watch this space: the tooling and infrastructure patterns being developed for robotics will feed back into how we think about multi-modal grounding.
Mistral Forge: a bet against the fine-tuning consensus
Mistral bets on 'build-your-own AI' as it takes on OpenAI, Anthropic in the enterprise — TechCrunch
Mistral launched Forge at GTC, a platform that lets enterprises train custom AI models from scratch on their own data. This is an explicit counter-position to the prevailing enterprise AI strategy — fine-tuning existing foundation models or bolting on retrieval-augmented generation.
The argument: some organisations need models that are fundamentally theirs, not adaptations of someone else's weights. For regulated industries — finance, pharma, defence — owning the base model eliminates an entire category of vendor dependency. You control the training data, the architecture decisions, and the update cadence.
The counterargument is equally strong. Fine-tuning is cheaper, faster, and battle-tested. Training from scratch demands data engineering maturity that most enterprises do not have. Mistral is betting that the top end of the enterprise market will pay for control — the question is whether that segment is large enough to sustain the business.
What to do: If you are in a regulated industry with genuine proprietary data advantages, evaluate Forge against your current fine-tuning pipeline. For everyone else, this is worth monitoring but not worth switching for — fine-tuning and RAG remain the pragmatic defaults.
The Pentagon wants AI trained on classified data — and is moving away from Anthropic
The Pentagon is planning for AI companies to train on classified data — MIT Technology Review
The Pentagon is setting up secure environments where AI companies can train military-specific models on classified datasets. This goes beyond the current arrangement, where models like Claude are used for inference in classified settings — this is training on the data itself. Separately, TechCrunch reports the Pentagon is developing alternatives to Anthropic following what it describes as a dramatic falling-out.
Two implications. First, defence AI is becoming a distinct market with its own infrastructure requirements — secure training environments, cleared personnel, air-gapped model serving. If government contracts are on your roadmap, these capabilities are table stakes. Second, the Anthropic situation demonstrates how quickly political dynamics can reshape AI procurement. Technical merit is necessary but not sufficient when selling to government.
What to do: If you are building for government, start investing in secure training environment capabilities now. The compliance moat is forming and early movers will have structural advantage.
CODA cuts reasoning costs 60% on easy tasks without losing accuracy
CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning — arXiv
CODA addresses a problem you have likely noticed: reasoning models overthink simple queries, burning tokens on rationales that add no accuracy. The system uses two gating mechanisms to modulate a reward signal — one penalises verbosity on easy tasks, the other encourages deeper reasoning on hard ones. No external annotations required; difficulty is estimated through group-based rollouts.
The result: 60%+ token reduction on easy tasks with maintained accuracy, and improved performance on hard tasks through incentivised deliberation. If you are running reasoning models at scale, the savings compound fast.
What to do: Watch for CODA-style adaptive compute in the next generation of hosted reasoning APIs. If you are training or fine-tuning your own reasoning models, the difficulty-aware reward shaping technique is implementable now.
Google makes Personal Intelligence free for all US users
Bringing the power of Personal Intelligence to more people — Google AI Blog
Google expanded Personal Intelligence — the feature connecting Gmail, Photos, and other Google apps to Gemini — to all US users for free. Previously limited to paid AI Pro and AI Ultra subscribers, this is now available across AI Mode in Search, the Gemini app, and Gemini in Chrome.
This is Google weaponising its deepest moat: your personal data graph. No competitor has simultaneous access to your email, photos, calendar, documents, and search history. Making it free raises switching costs for hundreds of millions of consumer users.
Quick hits
- Nvidia Nemotron 3 Nano 4B ships as a hybrid Mamba-Transformer at 4 billion parameters — 18 tokens/sec on a Jetson Orin Nano 8GB, 100% accuracy recovery at FP8 quantisation, and available in GGUF for llama.cpp. The most compelling edge model in its class right now.
- Garry Tan's Claude Code setup went viral on GitHub — thousands are trying it, and the polarised reactions are a useful barometer of how mainstream agentic coding has become.
- Nvidia DLSS 5 was billed as the biggest graphics breakthrough since ray tracing, but early demos drew unfavourable comparisons to motion smoothing. A presentation problem, not necessarily a technology problem.
- IP KVM vulnerabilities disclosed across four manufacturers — internet-exposed devices giving BIOS-level access. If you have these in your infrastructure, patch now.
- Split Federated Learning paper proposes architectures reducing training delay and communication overhead for distributed learning across data silos.
- OpenAI launched Parameter Golf, a developer challenge — details still emerging.
Bottom line
People believe they are collaborating with AI when they are actually delegating to it — and until you instrument the difference, your organisation's quality controls and risk exposure are calibrated to a fiction.
That's today's briefing. Subscribe free to get this in your inbox every morning.