MusicLM: Generating Music From Text
Google's 2023 MusicLM generated minutes of coherent music from a text caption and released MusicCaps, 5,500 expert-written pairs.
What the papers actually said - linked to the originals.
Google's 2023 MusicLM generated minutes of coherent music from a text caption and released MusicCaps, 5,500 expert-written pairs.
A 2023 Breakthrough Listen study used a deep-learning autoencoder to scan 820 stars for technosignatures, flagging eight signals to recheck.
BLIP-2 bridged a frozen image encoder and a frozen LLM with a tiny Q-Former, beating Flamingo with 54x fewer trainable parameters.
A University of Chicago tool that adds near-invisible 'cloaks' to art so AI models cannot easily learn and copy an artist's style.
A 2023 PLOS Digital Health study found ChatGPT scored at or near the passing threshold on all three US medical licensing exams with no medical training.
The 2023 Meta paper that taught a language model to decide on its own when to call APIs like a calculator or search engine.
ControlNet, the 2023 method that adds pose, edge, and depth control to a frozen diffusion model using zero-initialized convolutions.
A 2023 paper showing an attacker could poison popular web-scale datasets like LAION-400M for around $60 by exploiting expired domains.
Hyena replaces attention with long implicit convolutions and gating, reaching Transformer quality at subquadratic cost.
The 2023 Greshake paper that named indirect prompt injection: hiding malicious instructions in data an LLM later reads.
Consistency models generate images in a single step by mapping noise directly to data, cutting diffusion's many sampling steps.
PaLM-E injected images and robot sensor data into a 562B-parameter language model, planning robot tasks while topping vision benchmarks.
Diffusion Policy generated robot actions with a denoising diffusion process, beating prior methods by 46.9 percent on average.
A study reported that GPT-4 scored well enough on the Uniform Bar Exam to pass in every US jurisdiction that uses it.
Meta's 2023 Science paper showed a 15-billion-parameter protein language model could predict structures fast enough to map 600 million metagenomic proteins.
An OpenAI-led paper estimated about 80% of US workers could have at least 10% of their tasks affected by LLMs.
A 2023 paper that let agents learn from written self-reflection stored in memory rather than weight updates, hitting 91% on HumanEval.
A 27-author manifesto arguing that progress in AI should draw on neuroscience, proposing an embodied Turing test as a benchmark.
The 2023 Microsoft Research paper arguing an early GPT-4 showed sparks of general intelligence, sparking a fierce debate.
Bloomberg's 50-billion-parameter language model, trained on a 363-billion-token financial corpus plus general data.
The 2023 Madaan et al. paper where a model critiques and improves its own output across iterations, with no extra training.
A 2023 KAUST paper that paired role-playing LLM agents and let them cooperate autonomously through inception prompting.
Google's paper on TPU v4, which uses optical circuit switches to rewire 4,096 chips into a reconfigurable AI supercomputer.
Meta's 2023 Segment Anything introduced a promptable segmentation model and SA-1B, a dataset of over 1 billion masks on 11 million images.
The 2023 Stanford and Google paper that put 25 LLM-driven characters in a sandbox town and watched them plan a party on their own.
A 2023 paper pairing symbolic regression with logical reasoning to find equations that fit data and follow from background theory.
Meta's DINOv2 trained a billion-parameter vision transformer with no labels, producing general features that work across tasks without fine-tuning.
LLaVA connected a CLIP vision encoder to an LLM and trained it on GPT-4-generated image instructions to build an open visual assistant.
Meta's 2023 paper on Fully Sharded Data Parallel, which shards parameters, gradients, and optimizer states across workers to train far larger models.
The first large field study of generative AI at work found a 14% average productivity gain for support agents, concentrated among novices.
The 2023 Schaeffer et al. paper, a NeurIPS best-paper winner, arguing that LLM emergent abilities are an artifact of the chosen metric, not the model.
Meta's ImageBind learned a single embedding across six modalities using only image-paired data to tie them all together.
The 2023 Yao et al. framework letting models search over branching reasoning steps with lookahead and backtracking.
A follow-up study argued GPT-4's famous near-90th-percentile bar exam result was overstated and closer to the median.
Glot500 is a 2023 model and corpus covering 511 mostly low-resource languages, pushing multilingual NLP well past the usual 100.
The 2023 Google paper introducing grouped-query attention, which shrinks the inference memory of Transformers while keeping near-full attention quality.
RWKV trains in parallel like a Transformer but runs at inference like an RNN, with constant memory per token.
The 2023 paper showing a 65B-parameter model could be finetuned on a single 48GB GPU using 4-bit quantization, putting LLM tuning within hobbyist reach.
A 2023 paper that plans all of an agent's tool calls up front instead of interleaving them, cutting token use roughly fivefold on HotpotQA.
A 2023 Berkeley paper that fine-tuned an open model to write accurate API calls and beat GPT-4 at the task.
The 2023 paper in which a GPT-4-driven agent explored Minecraft on its own, writing and saving reusable code as skills.
The 2023 Stanford paper showing preference alignment can be done with a simple classification loss, skipping the reward model and RL of classic RLHF.
Shows transformers solve multi-step tasks by matching memorized subpatterns, not true compositional reasoning.
The 2023 OpenAI paper showing step-by-step process supervision beats outcome supervision, and releasing the PRM800K dataset.
A 2023 quantization method that protects a small fraction of salient weights, guided by activations, to compress LLMs to 4 bits accurately.
The 2023 TII paper arguing that filtered, deduplicated web data alone can train models that beat ones trained on curated corpora like The Pile.
Meta's 2023 MusicGen generated music from text or a melody with a single transformer, and shipped open as part of AudioCraft.
The 2023 paper that introduced MT-Bench and showed GPT-4 can grade other chatbots with over 80% agreement with humans.