Neural transducers are commonly used for automatic speech recognition (ASR), often achieving state-of-the-art results for quality and inference speech; for instance, they power Google's offline ASR engine. In this post, I'd like to
This is a continuation of Part 1 of this two-part series. In this post, I'll try to go over the implementation of PQMF filters in sufficient detail such that you'll be able to
In the past year or so, there's been several papers that investigate using sub-band coding with neural vocoders to model audio and accelerate inference: FFTNet with sub-band codingWaveNet with sub-band codingDurIan TTS System
A deep dive into several Facebook publications about knowledge-augmented language tasks, such as question answering and entity linking.
In this post, I'll derive the equations for DiffWave and WaveGrad using diffusion probabilistic processes.
DiffWave and WaveGrad propose a new neural vocoder model based on diffusion probabilistic processes, with several nice properties and a solid theoretical justification.