Two undergrads built an AI speech model to rival NotebookLM

Topics

tardy

Amazon

Image Credits:ChaiyonS021(opens in a new window)/Shutterstock(opens in a new window)

Apps

Biotech & Health

Climate

a microphone

Image Credits:ChaiyonS021(opens in a new window)/Shutterstock(opens in a new window)

Cloud Computing

Commerce

Crypto

endeavour

EVs

Fintech

Fundraising

Gadgets

Gaming

Google

Government & Policy

computer hardware

Instagram

Layoffs

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

television

Partner Content

TechCrunch Brand Studio

Crunchboard

get hold of Us

A pair of undergraduate , neither with extensive AI expertise , say that they ’ve make an openly useable AI model that can yield podcast - style clips standardized toGoogle ’s NotebookLM .

The market for synthetic speech tools is Brobdingnagian and growing . ElevenLabs is one of the largest player , but there ’s no deficit of contender ( seePlayAI , Sesame , and so on ) . Investors trust that these tools have immense potential . According to PitchBook , startups developing voice AI tech raised over $ 398 million in VC support last class .

Toby Kim , one of the Korea - based carbon monoxide gas - founder ofNari Labs , the group behind the freshly released role model , articulate that he and his mate atomic number 27 - founding father started learn about spoken language AI three month ago . Inspired by NotebookLM , they wanted to create a model that offered more ascendancy over give voices and “ freedom in the script . ”

Kim says they used Google ’s TPU Research Cloud platform , which provides researchers with free admittance to the company ’s TPU AI chip , to train Nari ’s mannequin , Dia . Weighing in at 1.6 billion parameters , Dia can mother dialog from a script , allow user customize talker ’ tones and insert disfluency , coughing , laughs , and other nonverbal cues .

Parameters are the interior variables models expend to make predictions . Generally , models with more parameters perform better .

Available from the AI dev platformHugging FaceandGitHub , Dia can range on most modern PCs with at least 10 GB of VRAM . It generates a random voice unless cue with a description of an intended style , but it can also clone a person ’s voice .

In TechCrunch ’s abbreviated testing of Dia through Nari’sweb demo , Dia worked quite well , uncomplainingly generating two - mode chats about any discipline . The quality of the voices seems competitive with other tool out there , and the voice cloning mapping is among the easiest this reporter has tried .

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

Here ’s a sample :

Like many voice generators , Dia offers slight in the way of guard , however . It ’d be trivially gentle to craft disinformation or a scammy transcription . On Dia ’s project pages , Nari discourages ill-treatment of the manakin to impersonate , deceive , or otherwise engage in illicit campaigns , but the group allege it “ is n’t responsible ” for abuse .

Nari also has n’t disclosed which data it quarrel to cultivate Dia . It ’s potential Dia was developed using copyright content — a commenteron Hacker News notes that one sample sounds like the hosts of NPR ’s “ Planet Money ” podcast . Training models on copyright cognitive content is a widespread but legally dubitable practice . Some AI companies claim that fair use shields them from liability , while rights holder assert that fair manipulation does n’t utilize to training .

In any case , Kim say Nari ’s design is to create a man-made voice political program with a “ social aspect ” on top of Dia and larger , succeeding model . Nari also intends to release a technical story for Dia , and to amplify the mannikin ’s support to lyric beyond English .

Topics#

More from TechCrunch#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Topics

More from TechCrunch

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI