Today's Deep-Dive: ELATO
Ep. 280

Today's Deep-Dive: ELATO

Episode description

The episode discusses ELATO, a project aiming to integrate sophisticated conversational AI into physical objects like toys and plushies. This technology goes beyond simple talking toys, focusing on merging hardware, software, and distinct AI personalities to create hyper-realistic interactions. The ELATO device is a small IoT gadget with a microphone and speaker, attachable to existing toys via silicone straps. Setup is designed to be simple: clip the device, connect to Wi-Fi, and choose an AI personality. Two products are available: a consumer version for $69 with unlimited AI character access and a developer kit for $59 with open-source firmware. The device boasts a week-long battery life and has garnered significant community interest. ELATO AI emphasizes the creation of over a hundred unique and often complex AI personalities, ranging from comforting characters like Dottie Mae to flamboyant figures like Captain Star Flash and dark-humored personalities like Sugar Plum. The technology relies on real-time speech-to-speech conversion, leveraging multiple AI models from providers like OpenAI and Google to ensure low latency and high-quality conversations. The architecture involves the IoT device, a fast edge server for routing AI requests, and a front-end app for character selection and customization. ELATO AI aims for under two seconds of round-trip latency, with updates delivered over-the-air. The project’s core idea is to move AI interaction from screens into physical objects, offering personalized and potentially unrestrained digital companionship, prompting reflection on the implications of designing unhinged or provocative AI companions.

Gain digital sovereignty now and save costs

Let’s have a look at your digital challenges together. What tools are you currently using? Are your processes optimal? How is the state of backups and security updates?

Digital Souvereignty is easily achived with Open Source software (which usually cost way less, too). Our division Safeserver offers hosting, operation and maintenance for countless Free and Open Source tools.

Try it now for 1 Euro - 30 days free!

Download transcript (.srt)
0:00

Welcome to the deep dive before we jump in today a quick. Thank you to our

0:03

supporter safe server

0:05

Safe server handles software hosting and they're really focused on supporting your

0:09

digital transformation

0:10

So if you're looking for reliable hosting you can check them out at

0:13

www.safe server dot de like I said today

0:18

We're embarking on a mission that feels a bit

0:22

Well bit sci-fi maybe don't it leans that way we're looking at a lotto AI

0:27

Right, and the core idea is taking sophisticated conversational AI like really

0:32

advanced stuff and getting it out of the screen

0:34

Out of your phone or computer and into physical things specifically toys plushies

0:40

even it sounds simple

0:41

But the sources suggest this is way beyond just a talking toy exactly

0:45

We're gonna break down how they're trying to merge the hardware the software and

0:48

these really distinct AI personalities

0:51

The goal seems to be making these interactions feel hyper realistic. That's the

0:56

hook isn't it?

0:57

The source material literally says they're giving plushies voices that feel

1:00

ridiculously real

1:01

Yeah, it made me think of that movie Ted the talking teddy bear, right?

1:05

But imagine that powered by actual live cutting-edge AI. That's kind of wild and

1:10

that's what we're diving into

1:11

It's sort of the next step for digital companions moving them into the physical

1:15

world. Okay, let's start with the basics the hardware

1:18

For anyone maybe new to this kind of tech. What is the a lotto device?

1:23

Physically well at its heart. It's a small gadget an IOT client. Technically. He's

1:29

got the microphone

1:30

It's got the speaker but the clever part is how it attaches. Okay, it uses two

1:35

simple silicone straps

1:36

So you can clip it on to pretty much any toy you already have. Oh, right

1:40

So you don't need to buy their specific toy. Nope that old teddy bear in the attic

1:45

Yeah, suddenly you can have you know a brain and a voice that flexibility seems

1:49

like a big deal

1:50

It really is and the setup sounds incredibly simple aimed at well anyone no tech

1:55

skills needed

1:56

How simple are we talking like three steps simple first clip the device onto the

2:00

toy?

2:01

Okay, second connect it to your home Wi-Fi. It uses what's called a captive portal

2:06

Uh-huh like when you connect at a hotel exactly that it makes its own little

2:09

network

2:10

Temporarily to guide you super easy and third pick a character personality from

2:14

their list and just start talking to it

2:17

Wow, okay. Now I saw they're actually two different products mentioned. Yeah, they're

2:21

catering to slightly different people

2:22

There's the main AI device. That's the consumer one, right pre-order price

2:26

mentioned was $69 that gets you the device

2:29

Access to all the AI characters unlimited apparently and a free month of their

2:34

premium subscription

2:35

It's the plug-and-play version and the other one for tankers

2:38

That's the AI dev kit a bit cheaper $59 on pre-order

2:42

This one's really for developers makers people who want to mess around with it. How

2:47

so it has open source firmware

2:48

Runs over a standard USB C connection and lets you load your own custom voices or

2:54

even your own AI models

2:55

If you want much more flexible if you're technically inclined gotcha and practical

2:59

things

3:00

Battery life is this thing always plugged in apparently not they claim a week of

3:04

battery life, which is pretty good makes it actually portable

3:08

Yeah, that's essential if it's meant to be a companion, and I saw something about

3:12

community support uh-huh over 1200 stars

3:15

They said which suggests. There's already a decent buzz around it people are

3:19

interested that early engagement is usually a good sign

3:22

Definitely shows people are intrigued by the idea yeah, and maybe even want to

3:26

build on it themselves, okay?

3:27

Let's shift gears the hardware is neat, but the sources really emphasize the

3:32

personalities the who

3:34

This seems to be where a lot of really tries to stand out absolutely this isn't

3:39

just about making a toy talk

3:40

It's about giving it a very specific often complex character

3:44

They mentioned over a hundred ai characters available a hundred and they're not

3:49

just slight variations

3:50

Yeah, not at all the examples they give are incredibly diverse they seem to be

3:54

leaning into strong personalities even flawed ones

3:57

Not just helpful assistant, okay. Give us some examples. What kind of range are we

4:01

talking well? You've got the comforting

4:04

nostalgic types

4:05

Like Dottie Mae Dottie Mae described as a classic Southern diner waitress uses

4:10

terms like hun sweetie

4:12

gives unsolicited advice

4:15

recommends the pie

4:17

Pure comfort food in voice form basically ah okay, so that's one end. What about

4:22

the other end? Oh they go there dramatic flamboyant characters

4:25

There's captain star flash is a super overconfident space captain who thinks laser

4:30

solve everything right or dr

4:32

Voltanus the classic mad scientist full of manic energy apparently shouts catchphrases

4:36

think loud thunder effect

4:38

So you could clip this onto like a superhero toy or something exactly or maybe

4:42

something completely incongruous for comedic effect

4:45

And what about more thoughtful characters yep?

4:48

They mentioned paradox pithius an ancient Greek philosopher type sounds wise wise,

4:53

but also apparently kind of smug

4:54

He answers your deep questions

4:56

With even deeper possibly more annoying questions makes you think but maybe grinds

5:02

your gears a little okay

5:03

This is where that uncensored aspect might come into right the comedy take sugarplum

5:07

the description is fascinating

5:08

Speaks in a super sweet bubbly childlike voice sounds innocent

5:13

But apparently drops comments so dark it makes Satan clutches pearls whoa

5:18

Okay, that's a choice. It's intentional friction right that contrast creates shock

5:23

value makes it memorable

5:25

It's not trying to be bland and they seem to lean into existing pop culture stuff,

5:29

too. I saw Ted mentioned

5:30

Yeah, Ted the inappropriate Teddy. Yeah, clearly referencing the movie character

5:35

Boston accent bar fly mouth

5:37

Can you imagine where that goes uncensored indeed any other specific types loads?

5:42

They mentioned Mikey Sally Sullivan hardcore Boston guys swearing rants, then there's

5:47

the proper British lad

5:49

What's his deal judges your tea making skills apologizes constantly if you bump

5:53

into him very specific cultural niche

5:55

It seems like they're aiming for very defined archetypes. Totally and it's not just

5:59

comedy or stereotypes

6:00

They even list Zoran Mamdani the political activists. Yeah described as empathetic

6:04

focused on social equity and justice

6:07

So the range covers serious and specific viewpoints to not just jokes

6:11

So the strategy isn't just make a friend

6:14

It's pick a very specific memorable character exactly depth and distinctiveness

6:18

over just being generally agreeable

6:21

You clip it on you get that personality fully formed which brings us neatly to the

6:26

how we know the what the device we know

6:28

These wild personalities

6:31

How does the tech actually pull this off in real time making a toy have a

6:35

continuous natural conversation globally?

6:38

That sounds hard. It is hard. The key seems to be what the source calls real-time

6:43

speech to speech conversion

6:45

We're talking potentially up to 15 minutes of uninterrupted chat 15 minutes. Wow

6:50

How they use what the source referred to as a brain trust? They're not relying on

6:54

just one AI model

6:55

Oh, okay. So they're pulling from multiple sources, which ones? Yeah, it's quite a

6:59

list of the big names right now

7:00

Open AI is real-time API

7:02

Google's Gemini live API 11 labs AI agents and also Hume AI EVI for four different

7:09

ones

7:09

Why so many wouldn't that be complicated? It probably is but the idea is that each

7:13

model has strengths

7:14

Maybe one is faster one sounds more natural one is better at catching emotional

7:18

cues

7:19

By using several they can kind of pick the best tool for the job for each part of

7:22

the conversation or blend them

7:25

It helps keep the latency low and the quality high like hedging your bets that

7:30

makes sense redundancy and optimization

7:32

Okay for someone listening who isn't a developer. Can we simplify the architecture?

7:36

You mentioned a triangle earlier

7:37

Yeah, think of it as three core pieces working together really fast first

7:41

You've got the device itself the IOT client that ESP 32 thing

7:45

We talked about clip to the toy it just captures your voice and plays the AI's

7:49

voice sends the audio securely using web sockets

7:52

Okay, piece one the ears and mouth on the toy exactly piece two is the edge server

7:57

This runs on something called Dino think of it as the super fast traffic controller

8:01

or router my edge

8:02

It means it's located geographically close to you and also close to the big AI

8:06

models

8:07

Its whole job is to grab the audio from the toy

8:10

Instantly fire it off to the right AI service like Gemini or 11 labs get the

8:15

response back and zap it straight to the toys speaker

8:18

Minimizes delay got it the middleman ensuring speed and the third piece. That's the

8:25

front end

8:25

Basically the website or app you use built with next.js. This is where you choose

8:30

your characters

8:31

Maybe create custom ones adjust the volume that kind of thing. Ah, and I saw you

8:34

can tweak the pitch

8:35

Yeah, the pitch factor so you could take a serious character's voice and make it

8:39

sound high pitched and cartoonish if you wanted more

8:42

Customization. Okay. So the whole thing relies on speed if there's a big delay it

8:46

ruins the illusion of conversation

8:48

What kind of performance are they claiming? The numbers are pretty impressive,

8:51

especially for a global system

8:52

They're aiming for under two seconds round-trip latency under two seconds from you

8:58

speaking to hearing the reply

9:00

Yeah, which is generally fast enough to feel pretty conversational not like a walkie-talkie

9:05

and the audio quality. Does it sound clear?

9:07

They mentioned using the Opus Kodak at 12 kiloby piece

9:11

Which in non-technical terms means it should sound pretty clear and crisp even

9:17

though they're keeping the data rate low for speed

9:19

Okay, one more tech thing. How does it know when I've finished talking? Do I have

9:24

to press a button? No, and that's crucial

9:26

They use something called Server VAD voice activity detection. Server VAD?

9:31

Right, instead of the little device trying to guess, the powerful server analyzes

9:36

the audio stream in real time

9:38

It figures out precisely when you've naturally paused or finished speaking. Ah, so

9:42

it makes turn-taking much smoother

9:44

Exactly, less awkward silence, fewer interruptions, key for making it feel real.

9:50

Plus they mentioned OTA updates. Over the air?

9:52

Yeah, means the software on the device can be updated automatically over Wi-Fi

9:56

So it can get better over time without you needing to plug it into a computer. Okay,

10:00

so putting it all together

10:01

It's quite an ambitious project. Merging these very specific, sometimes wild

10:06

personalities with hardware that enables smooth, fast

10:09

conversation. It really is. The big takeaway seems to be shifting AI interaction

10:14

away from just typing in a box. And

10:16

into a physical object you can actually talk with. Like, really talk with. Whether

10:21

you want that companion to be a nurturing waitress like Dottie May or

10:24

a sarcastic philosopher or an inappropriate teddy bear. Right, it's that

10:29

customization delivered through a physical form.

10:32

So the final thought for you listening, the source emphasizes this device has no

10:37

filters, no rules.

10:39

We have the tech now to give an innocent looking plushie a voice that could be,

10:44

well,

10:44

deliberately offensive like Ted, or shockingly dark like Sugar Plum, or maybe even

10:48

politically charged.

10:50

If digital companionship becomes totally personalized and unrestrained, what does

10:55

that mean?

10:55

What happens when we start designing companions not to be helpful or polite, but

10:59

maybe

10:59

unhinged?

11:01

Provocative. Something to think about as this tech develops.

11:04

Well, that's all we have time for on this deep dive and thanks again to our

11:07

supporters Safe Server.

11:08

Remember, they handle software hosting and support digital transformation.

11:12

sources.

11:12

sources.