Whisper V3 Turbo
accounts/fireworks/models/whisper-v3-turbo
Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.
Whisper V3 Turbo is available via Fireworks' Speech-to-Text APIs, where you are billed based on the duration of the transcribed audio. The API supports multiple languages and additional features, including forced alignment.
Run queries immediately, pay only for usage
import requests with open("audio.mp3", "rb") as f: response = requests.post( "https://audio-turbo.us-virginia-1.direct.fireworks.ai/v1/audio/transcriptions", headers={"Authorization": f"Bearer <YOUR_API_KEY>"}, files={"file": f}, data={ "model": "whisper-v3-turbo", "temperature": "0", "vad_model": "silero" }, ) if response.status_code == 200: print(response.json()) else: print(f"Error: {response.status_code}", response.text)