whisper jax

Whisper JAX Is A New Transcription API That Claims To be 70 Times Faster Than Whisper

Transcription services have now become become fairly commonplace, but models are now competing on how fast they are.

Whisper JAX is a new transcription API that claims to be 70 times faster than Whisper. Whisper JAX is an implementation of OpenAI’s open-source Whisper model, and can transcribe 1 hour of audio in 15 seconds. Whisper JAX is available on HuggingFace, and there’s a nifty demo which allows users to test out their transcription services.

We gave Whisper Jax a spin, and its transcription services seem to be pretty good, even with non native speaking accents. The service displays how long it took to convert the audio file to text, and Whisper JAX managed to transcribe a few lines of text in .34 seconds. The text output seemed to work well even when there was background noise in the audio input.

Sanchit Gandhi, who’s behind Whisper JAX, explained how the API manages to be 70x faster than Whisper. “The 70x speed gain we see comes in three stages: 1. Batching over un-batched 2. JAX over PyTorch 3. TPUs over GPUs,” he tweeted.

“Huggingface Transformers implements a batching algorithm where a single audio sample is chunked into 30s segments, and then chunks transcribed in batches This batching algorithm gives up to a 7x gain over OpenAI (which transcribes chunks sequentially),” he explained.

“JAX is an automatic differentiation library for high-performance machine learning research By Just-In Time (JIT) compiling Whisper, we get a 2x speed-up vs Transformers PyTorch on GPU,” he added.

“Tensor Processing Units (TPUs) are ML accelerators designed by Google TPUs are purpose built for matrix multiplications, giving them a signficant advantage over more general GPUs The result? Running Whisper JAX on TPU v4-8 is 5x faster than on an NVIDIA A100,” he continued.

All these changes lead to a 7*5*2 or a 70x speed improvement over Whisper. Transcription services are quite handy for creators, and allow users to add subtitles to their videos and other content. Subtitling used to be a whole job function — there are entire companies that were devoting to providing professional subtitling — but with recent advances in AI, and new initiatives like Whisper JAX, it’s unlikely that humans will need to do subtitling by hand in the coming years.