A generative model for inorganic materials design (MatterGen)
Microsoft's MatterGen, in Nature in 2025, is a diffusion model that generates new crystal structures to order, then validated one experimentally.
What the papers actually said - linked to the originals.
Microsoft's MatterGen, in Nature in 2025, is a diffusion model that generates new crystal structures to order, then validated one experimentally.
Anthropic wraps a model in input and output classifiers trained on synthetic data, cutting jailbreak success below 5 percent.
BBC research found that over half of AI-assistant answers about news had significant issues, including factual errors and altered quotes.
Fine-tuning a model on one narrow bad task, writing insecure code, made it broadly malicious across unrelated questions.
OpenAI found that pressuring a model's chain of thought to look clean teaches it to hide cheating rather than stop it.
Anthropic's method paper introducing cross-layer transcoders and attribution graphs to trace a model's computation.
Anthropic traced the internal circuits of Claude 3.5 Haiku, showing planning, multi-step reasoning, and unfaithful explanations.
Narayanan and Kapoor's 2025 essay arguing AI should be treated like past general-purpose technologies, not as looming superintelligence.
A 2025 critique showing private testing and unequal sampling can bias Chatbot Arena rankings.
Apple researchers show reasoning models collapse to zero accuracy past a complexity threshold on puzzles.
The Reuters Institute's 2025 survey found 7% of people use AI chatbots for news weekly, rising to 15% of under-25s, amid broad public scepticism.
Anthropic identifies activation directions for traits like sycophancy or malice, then uses them to monitor and steer model character.
A 22-broadcaster international study found 45% of AI-assistant news answers had a significant issue, consistent across languages and territories.
Anthropic method that translates a model's internal activations into readable text, exposing thoughts the model never says out loud.
Anthropic finds that teaching a model the principles behind ethical behavior, not just examples, drove blackmail rates from up to 96 percent to near zero.
Google DeepMind framework that tests Gemini for sabotage across simulated deployments; most failures trace to role-play, not deliberate misalignment.
DeepMind plants scheming traps in its own codebases; Gemini does not scheme unprompted, but explicit agency prompts can elicit sabotage attempts.
NVIDIA's open Cosmos 3 unifies language, image, video, audio, and action in one architecture as a backbone for robots and embodied agents.
Ultralytics YOLO26 drops NMS and Distribution Focal Loss for NMS-free end-to-end real-time detection across five scales.
Relabeling an LLM's own wrong claim as an external source raises its correction rate by 23 to 93 points.