Technology Behind

A black and white cartoon-style person looking through a book bag surrounded by laptop, notebook glasses, tablet, pen, and books

The technology behind was released by OpenAI as an open-source tool and is described in detail in a study called Robust Speech Recognition via Large-Scale Weak Supervision. The paper highlights how Whisper is different than prior models.

A model in machine learning is a computer program that has been trained on a large amount of data to learn patterns and relationships within the data. These models are used to make predictions or decisions based on new data that they have not seen before.

The study goes on to describe how Whisper was trained on a large number of audio transcripts available from the internet, without any specific task in mind, and found that the resulting models were able to perform well on standard speech processing benchmarks without any fine-tuning.

When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks … When compared to humans, the models approach their accuracy and robustness.

Robust Speech Recognition via Large-Scale Weak Supervision

This is even more significant because the researchers behind Whisper released the trained models and code to the public. Check out this article in The New Yorker by James Somers, Whispers of A.I.’s Modular Future, which goes into more detail about how significant this may be for technology and business to come.

Read more about the technology driving on OpenAI’s blog