“DDSP: Differentiable Digital Signal Processing,” submitted to arXiv on January 14, 2020 by Jesse Engel, Lamtharn Hantrakul, Chenjie Gu, and Adam Roberts at Google’s Magenta team, took a different path from the brute-force audio models of the time. Instead of asking a large neural network to predict the raw waveform from scratch, DDSP wires classic signal-processing components - oscillators, filters, and reverberation - directly into the network and lets the network control them.
The trick is that these synthesizer building blocks are written so that gradients can flow through them, which means a network can learn to drive them during ordinary training. Because the model is built from interpretable parts, each piece can be manipulated on its own: you can shift pitch, change loudness, extrapolate beyond the training range, remove reverb, or transfer the timbre of one instrument onto another. The paper showed this achieves high-fidelity audio without the giant autoregressive models or adversarial losses that competing methods relied on, and released the code as an open library.
Why business readers should care: DDSP is a clean example of fusing decades of domain engineering with deep learning rather than discarding it. The hybrid is smaller, more controllable, and more interpretable than an end-to-end black box, a tradeoff that matters whenever a creative tool needs predictable, tweakable behavior.