ElevenLabs is launching its own speech-to-text model

Topics

Latest

Amazon

Image Credits:ElevenLabs

Apps

Biotech & Health

Climate

Image Credits:ElevenLabs

Cloud Computing

Commerce Department

Crypto

Image Credits:ElevenLabs

endeavor

EVs

Fintech

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

layoff

Media & Entertainment

More from TechCrunch

case

Startup Battlefield

StrictlyVC

Podcasts

picture

Partner Content

TechCrunch Brand Studio

Crunchboard

ElevenLabs , an AI startup that just raised a$180 million mega - funding round , has been in the main known for its audio - generation art . The company took a footstep in another technical direction by launching its first stand - alone speech - to - text manikin called Scribe .

The inauguration , valued at $ 3.3 billion , has aided many other companies in ply text edition - to - speech service through its Brobdingnagian program library of interpreter . However , the company is now count to get into speech detection and compete with the like ofGladia , Speechmatics , AssemblyAI , Deepgram , and OpenAI ’s Whisper models .

ElevenLabs ’ Scribe model supports over 99 languages at launch . The company categorizes over 25 language in fantabulous accuracy category for the example where the word error rate is less than 5 % . This inclination includes English ( claimed accuracy pace of 97 % ) , French , German , Hindi , Indonesian , Japanese , Kannada , Malayalam , Polish , Portuguese , Spanish , and Vietnamese . Other languages are rank in unlike family with high ( 5 % to 10 % word of honor error charge per unit ) , full ( 10 % to 20 % parole mistake rate ) , and moderate ( 25 % to 50 % ) word error rates .

The troupe say that the model outdo Google Gemini 2.0 Flash and Whisper Large V3 across multiple languages in FLEURS & Common Voice benchmark tests .

ElevenLabs had develop the voice communication - to - schoolbook ingredient for its AI conversational agent platform , which was released last yr . However , this is the first timethe company is release a stand - alone speech detection modelling . In a conversation with TechCrunch last month , CEO Mati Staniszewski talked about improving speech detection models .

“ We want to understand what ’s being said by you in a conversation better . We are working on way to move out from only generating content and understanding and transcribing address , ” Staniszewski said at that time . “ Many people say that speech - to - textbook is a solved job . But for many languages , it is pretty bad . We think we can build up skilful speech detection model because we have in - sign teams to annotate data and give us quick feedback . ”

The model also has smart speaker diarization to severalise you who is speaking , timestamp at Son level for accurate subtitles , and car - tagging sound events like audience laughters . The inauguration is providing a way of life for customer to straight transcribe video contentedness to add caption or captions in its studio .

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

Scribe presently only work with pre - recorded audio formats . The company said it will release a depressed - latency real - time version of the theoretical account presently . That signify it is not yet good for meeting transcriptions or voice note - pickings .

ElevenLabs is pricing Scribe at $ 0.40 for an hour of canned sound recording . While the charge per unit is competitory , some of its rivalsoffer a lower pricefor audio transcriptions at the import with some lineament differentiation .

Topics#

More from TechCrunch#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Topics

More from TechCrunch

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI