1
00:00:00,000 --> 00:00:03,800
Welcome to the Deep Dive, where we tear through the latest research to make sure

2
00:00:03,800 --> 00:00:04,600
you, the

3
00:00:04,600 --> 00:00:08,680
learner, get the critical knowledge without all the jargon.

4
00:00:08,680 --> 00:00:12,760
And today, we're looking at something that I think really is a fundamental shift.

5
00:00:12,760 --> 00:00:13,760
I agree.

6
00:00:13,760 --> 00:00:17,040
We are not talking about your usual chatbot today.

7
00:00:17,040 --> 00:00:21,780
We're tackling a concept that sounds like science fiction, AI that doesn't just

8
00:00:21,780 --> 00:00:22,020
talk

9
00:00:22,020 --> 00:00:24,200
to you, but actually works for you.

10
00:00:24,200 --> 00:00:28,380
I mean, an AI that can see your screen, navigate your apps, click buttons, type

11
00:00:28,380 --> 00:00:29,920
things out.

12
00:00:29,920 --> 00:00:33,840
Basically, do what a human operator does, but do it safely.

13
00:00:33,840 --> 00:00:35,720
And that's the key transition point.

14
00:00:35,720 --> 00:00:38,200
For years, this was all theoretical.

15
00:00:38,200 --> 00:00:42,820
But now, with the right kind of infrastructure, these agents can finally run these

16
00:00:42,820 --> 00:00:43,360
complex

17
00:00:43,360 --> 00:00:45,480
workflows across a bunch of different apps.

18
00:00:45,480 --> 00:00:47,320
Like what are we talking about here?

19
00:00:47,320 --> 00:00:48,320
Oh, anything.

20
00:00:48,320 --> 00:00:52,650
Editing an image in Photoshop, handling a checkout on Amazon, or even filing a

21
00:00:52,650 --> 00:00:53,160
complex

22
00:00:53,160 --> 00:00:55,760
report that uses three different company tools.

23
00:00:55,760 --> 00:00:58,920
And the piece of technology that infrastructure making this all possible is our

24
00:00:58,920 --> 00:00:59,840
focus today.

25
00:00:59,840 --> 00:01:02,320
Containers for computer use agents.

26
00:01:02,320 --> 00:01:04,040
Most people just call it CUA.

27
00:01:04,040 --> 00:01:07,000
It's the security layer, the scaling framework.

28
00:01:07,000 --> 00:01:11,040
It's what gets these powerful AIs out of the lab and securely onto your desktop.

29
00:01:11,040 --> 00:01:18,670
So our mission today is to unpack how we go from a basic command line AI to these

30
00:01:18,670 --> 00:01:19,400
agents

31
00:01:19,400 --> 00:01:24,160
that can control entire operating systems, Mac OS, Linux, Windows, and do it

32
00:01:24,160 --> 00:01:24,720
without

33
00:01:24,720 --> 00:01:25,720
breaking anything.

34
00:01:25,720 --> 00:01:27,400
OK, let's unpack this.

35
00:01:27,400 --> 00:01:32,120
But first, a quick note, this deep dive is supported by Safe Server.

36
00:01:32,120 --> 00:01:35,630
Safe Server manages the hosting for this exact kind of cutting-edge software, and

37
00:01:35,630 --> 00:01:36,080
they can

38
00:01:36,080 --> 00:01:38,400
support your digital transformation needs.

39
00:01:38,400 --> 00:01:41,850
So if you're looking for reliable hosting that can handle this next generation of

40
00:01:41,850 --> 00:01:42,540
computing,

41
00:01:42,540 --> 00:01:44,840
you can find out more at www.safeserver.de.

42
00:01:44,840 --> 00:01:49,200
All right, so to start, let's just get a really clear definition down.

43
00:01:49,200 --> 00:01:50,480
What is a computer use agent?

44
00:01:50,480 --> 00:01:52,080
Yeah, let's ground ourselves.

45
00:01:52,080 --> 00:01:57,570
It's an AI designed to do tasks by observing and interacting with a normal desktop

46
00:01:57,570 --> 00:01:58,240
environment.

47
00:01:58,240 --> 00:02:00,520
Just think of it like I said, digitalized in a robotic hand.

48
00:02:00,520 --> 00:02:04,920
OK, so it's using simulated mouse clicks and keyboard commands to get things done.

49
00:02:04,920 --> 00:02:05,920
Exactly.

50
00:02:05,920 --> 00:02:07,720
Things that usually need a human watching over them.

51
00:02:07,720 --> 00:02:09,360
That sounds incredibly powerful.

52
00:02:09,360 --> 00:02:13,120
I mean, it unlocks basically every piece of software that already exists.

53
00:02:13,120 --> 00:02:17,100
But the second you say you're giving an experimental AI the ability to click and

54
00:02:17,100 --> 00:02:18,240
type on my machine,

55
00:02:18,240 --> 00:02:21,840
a huge alarm bell just starts ringing in my head.

56
00:02:21,840 --> 00:02:22,840
It should.

57
00:02:22,840 --> 00:02:26,000
What's to stop it from just, you know, doing real damage?

58
00:02:26,000 --> 00:02:30,520
And that is the immediate non-negotiable problem that CUA was built to solve.

59
00:02:30,520 --> 00:02:32,680
The sources really emphasize this.

60
00:02:32,680 --> 00:02:37,450
Running these powerful, sometimes unpredictable agents locally is just, it's

61
00:02:37,450 --> 00:02:39,320
dangerous.

62
00:02:39,320 --> 00:02:40,320
How dangerous?

63
00:02:40,320 --> 00:02:44,480
There's this one anecdote that gets shared a lot from the early days of development.

64
00:02:44,480 --> 00:02:48,330
One of the teams had an agent set up that, and this is a quote, broke my computer,

65
00:02:48,330 --> 00:02:48,840
preventing

66
00:02:48,840 --> 00:02:50,400
disk writing.

67
00:02:50,400 --> 00:02:53,840
That's, that is a developer's absolute worst nightmare.

68
00:02:53,840 --> 00:02:57,720
So you've got a super smart agent, but you can't trust it not to just brick your

69
00:02:57,720 --> 00:02:58,120
whole

70
00:02:58,120 --> 00:02:59,120
system.

71
00:02:59,120 --> 00:03:00,120
Precisely.

72
00:03:00,120 --> 00:03:03,760
You can't just hand over the keys to your entire operating system to a tool that is,

73
00:03:03,760 --> 00:03:05,280
at its core, experimental.

74
00:03:05,280 --> 00:03:08,870
And what's fascinating here is that the community saw this risk immediately and

75
00:03:08,870 --> 00:03:09,560
demanded some

76
00:03:09,560 --> 00:03:10,840
kind of containment.

77
00:03:10,840 --> 00:03:12,720
CUA provides that safety.

78
00:03:12,720 --> 00:03:16,340
And the source material has a great analogy for it, which is CUA is effectively

79
00:03:16,340 --> 00:03:16,760
Docker

80
00:03:16,760 --> 00:03:18,440
for computer use agents.

81
00:03:18,440 --> 00:03:23,020
I love that analogy, but Docker is usually for, you know, server stuff, right?

82
00:03:23,020 --> 00:03:25,840
Processes without a graphical interface.

83
00:03:25,840 --> 00:03:29,760
Virtualizing a full desktop with a mouse and windows and video inside a container

84
00:03:29,760 --> 00:03:30,200
sounds

85
00:03:30,200 --> 00:03:31,200
way harder.

86
00:03:31,200 --> 00:03:32,200
It is much harder.

87
00:03:32,200 --> 00:03:33,560
And that's the key distinction.

88
00:03:33,560 --> 00:03:35,760
CUA doesn't just isolate a process.

89
00:03:35,760 --> 00:03:38,200
It isolates the entire interactive environment.

90
00:03:38,200 --> 00:03:43,690
Lets these agents control a full operating system, Mac OS, Linux, Windows, inside a

91
00:03:43,690 --> 00:03:44,040
secure

92
00:03:44,040 --> 00:03:45,160
virtual machine.

93
00:03:45,160 --> 00:03:46,840
So the isolation is total.

94
00:03:46,840 --> 00:03:47,840
Comprehensive.

95
00:03:47,840 --> 00:03:51,200
It makes sure the agent's actions stay inside that contain VM.

96
00:03:51,200 --> 00:03:52,740
So no data leaks out.

97
00:03:52,740 --> 00:03:55,780
And critically, the agent can't damage your main system.

98
00:03:55,780 --> 00:03:59,070
It's the framework that makes the whole idea of an intelligent agent actually

99
00:03:59,070 --> 00:03:59,600
functional

100
00:03:59,600 --> 00:04:00,600
and reliable.

101
00:04:00,600 --> 00:04:01,600
OK.

102
00:04:01,600 --> 00:04:07,090
So CUA is the foundation that makes this whole leap in human-computer interaction

103
00:04:07,090 --> 00:04:07,640
safe.

104
00:04:07,640 --> 00:04:11,730
It lets us build tools that really adapt to us, which should make technology feel

105
00:04:11,730 --> 00:04:12,040
more

106
00:04:12,040 --> 00:04:13,040
intuitive.

107
00:04:13,040 --> 00:04:15,020
So let's get into the how.

108
00:04:15,020 --> 00:04:19,320
How does the architecture pull off this mix of security and, I assume, high

109
00:04:19,320 --> 00:04:20,360
performance?

110
00:04:20,360 --> 00:04:21,360
Right.

111
00:04:21,360 --> 00:04:22,360
The architecture.

112
00:04:22,360 --> 00:04:25,640
So security is handled through what they call local sandboxes.

113
00:04:25,640 --> 00:04:29,200
Every single thing the agent does runs in this isolated environment, whether it's a

114
00:04:29,200 --> 00:04:30,940
full VM or a container.

115
00:04:30,940 --> 00:04:33,440
And that just guarantees privacy and security.

116
00:04:33,440 --> 00:04:34,440
It does.

117
00:04:34,440 --> 00:04:38,130
For instance, if the agent messes up and deletes a system file, that action is

118
00:04:38,130 --> 00:04:39,220
completely confined

119
00:04:39,220 --> 00:04:40,220
to the virtual world.

120
00:04:40,220 --> 00:04:42,120
Your actual computer is totally untouched.

121
00:04:42,120 --> 00:04:46,260
And the sources mentioned a specific performance thing, an optimization, that

122
00:04:46,260 --> 00:04:47,000
seemed really

123
00:04:47,000 --> 00:04:49,840
aimed at developers using the newest hardware.

124
00:04:49,840 --> 00:04:50,840
That's right.

125
00:04:50,840 --> 00:04:53,800
CUA is highly, highly optimized for Apple Silicon.

126
00:04:53,800 --> 00:04:57,200
This is a really critical design choice for developers who are doing a lot of local

127
00:04:57,200 --> 00:04:57,880
testing.

128
00:04:57,880 --> 00:05:02,920
Because by taking advantage of the M series chips, CUA gets, and I'm quoting here,

129
00:05:02,920 --> 00:05:03,500
blazing

130
00:05:03,500 --> 00:05:06,660
fast performance and energy efficiency.

131
00:05:06,660 --> 00:05:11,580
You can run the agent and its simulated computer at the same time on one laptop.

132
00:05:11,580 --> 00:05:12,580
Really fast.

133
00:05:12,580 --> 00:05:16,180
That makes total sense for a developer testing on their own machine.

134
00:05:16,180 --> 00:05:19,300
But it does bring up a pretty big question for bigger companies.

135
00:05:19,300 --> 00:05:23,550
If you optimize so heavily for Apple Silicon, don't you create a kind of vendor

136
00:05:23,550 --> 00:05:24,080
lock-in

137
00:05:24,080 --> 00:05:29,340
problem for enterprises that use, you know, huge Linux or Windows server farms?

138
00:05:29,340 --> 00:05:33,440
That is an excellent and a really critical question.

139
00:05:33,440 --> 00:05:37,550
The way the framework handles this is by using that optimization as a development

140
00:05:37,550 --> 00:05:38,160
booster,

141
00:05:38,160 --> 00:05:39,560
but building the system to be cross-platform.

142
00:05:39,560 --> 00:05:40,560
OK.

143
00:05:40,560 --> 00:05:41,560
So how does that work?

144
00:05:41,560 --> 00:05:45,420
Well, under the hood, it combines a super optimized Mac OS virtual machine with a

145
00:05:45,420 --> 00:05:45,700
more

146
00:05:45,700 --> 00:05:48,180
generic Python control interface.

147
00:05:48,180 --> 00:05:51,960
So while your local development is extra fast on an M series Mac, the framework

148
00:05:51,960 --> 00:05:52,540
itself is

149
00:05:52,540 --> 00:05:57,080
designed to manage environments for Mac OS, Linux using Docker, and Windows using

150
00:05:57,080 --> 00:05:57,260
its

151
00:05:57,260 --> 00:05:58,260
own sandboxes.

152
00:05:58,260 --> 00:06:01,350
Ah, so you can prototype really fast locally and deploy it to whatever cloud

153
00:06:01,350 --> 00:06:02,000
environment

154
00:06:02,000 --> 00:06:03,000
you need.

155
00:06:03,000 --> 00:06:04,000
Exactly.

156
00:06:04,000 --> 00:06:05,000
That makes sense.

157
00:06:05,000 --> 00:06:07,490
But for a developer, that still sounds like a lot to manage, VMs, containers,

158
00:06:07,490 --> 00:06:07,940
different

159
00:06:07,940 --> 00:06:11,920
operating systems, Python, how do you interact with all of that without just

160
00:06:11,920 --> 00:06:12,660
spending all

161
00:06:12,660 --> 00:06:14,380
your time on infrastructure?

162
00:06:14,380 --> 00:06:17,220
That is the whole job of the computer SDK.

163
00:06:17,220 --> 00:06:22,230
You should think of the computer SDK as like the unified robotic hand that controls

164
00:06:22,230 --> 00:06:23,020
everything

165
00:06:23,020 --> 00:06:24,680
in that virtual environment.

166
00:06:24,680 --> 00:06:25,680
It hides all that complexity.

167
00:06:25,680 --> 00:06:30,060
So I don't have to worry if I'm talking to a Linux container or a Mac OS VM.

168
00:06:30,060 --> 00:06:31,060
Nope.

169
00:06:31,060 --> 00:06:33,340
You just use one consistent, simple Python API.

170
00:06:33,340 --> 00:06:34,340
Yeah.

171
00:06:34,340 --> 00:06:36,220
It's actually a lot like Pyatokobi if you've ever used that.

172
00:06:36,220 --> 00:06:40,510
Oh, okay. So if I know how to automate a simple click with a tool like that, the

173
00:06:40,510 --> 00:06:40,860
computer

174
00:06:40,860 --> 00:06:42,380
SDK lets me do the same thing.

175
00:06:42,380 --> 00:06:46,580
Click, type, scroll, but safely inside the monitored CUA container.

176
00:06:46,580 --> 00:06:47,580
Precisely.

177
00:06:47,580 --> 00:06:51,700
And then to manage the actual human-to-AI conversation, the sources highlighted the

178
00:06:51,700 --> 00:06:53,420
AI Gradio integration.

179
00:06:53,420 --> 00:06:54,420
What's Gradio?

180
00:06:54,420 --> 00:06:57,890
It's the simple web interface that translates your plain English request like, hey,

181
00:06:57,890 --> 00:06:58,340
analyze

182
00:06:58,340 --> 00:07:02,400
the Q4 sales figures in this spreadsheet into actions that the agent can actually

183
00:07:02,400 --> 00:07:02,980
execute.

184
00:07:02,980 --> 00:07:05,380
It makes the whole loop incredibly smooth.

185
00:07:05,380 --> 00:07:08,020
OK, so we've got the secure box to run it in.

186
00:07:08,020 --> 00:07:11,660
Now let's talk about the brain that goes inside it, the intelligence layer.

187
00:07:11,660 --> 00:07:13,060
What is the Agent SDK?

188
00:07:13,060 --> 00:07:17,360
The Agent SDK is what you use to actually build the intelligence, the decision

189
00:07:17,360 --> 00:07:17,900
maker,

190
00:07:17,900 --> 00:07:19,780
that runs on those CUA computers.

191
00:07:19,780 --> 00:07:23,300
Sort of the brain plug-in, it gives you a consistent way to run the language models

192
00:07:23,300 --> 00:07:24,300
themselves.

193
00:07:24,300 --> 00:07:26,660
And it seems like flexibility is key here.

194
00:07:26,660 --> 00:07:31,320
It supports a huge range of models, from the giant cloud ones to smaller open

195
00:07:31,320 --> 00:07:32,140
weight ones

196
00:07:32,140 --> 00:07:33,500
you can run on your own machine.

197
00:07:33,500 --> 00:07:38,430
It does, and a developer can switch between them just by changing a prefix in the

198
00:07:38,430 --> 00:07:39,000
code.

199
00:07:39,000 --> 00:07:43,300
If you want the power of OpenEye, or a cheaper open source option through Elama, or

200
00:07:43,300 --> 00:07:43,660
a super

201
00:07:43,660 --> 00:07:46,940
optimized local one with MLX, the SEK just handles it.

202
00:07:46,940 --> 00:07:49,950
Okay, here's where it gets really interesting for me, because now we can actually

203
00:07:49,950 --> 00:07:50,380
measure

204
00:07:50,380 --> 00:07:53,560
how intelligent these things are in a real world setting.

205
00:07:53,560 --> 00:07:57,160
What kind of a performance jump do we see when these agents get to use the best

206
00:07:57,160 --> 00:07:57,620
models

207
00:07:57,620 --> 00:07:58,620
out there?

208
00:07:58,620 --> 00:08:01,220
The sources gave a really compelling preview of this.

209
00:08:01,220 --> 00:08:05,320
They show that when you swap the main reasoning model from something already very

210
00:08:05,320 --> 00:08:05,980
capable,

211
00:08:05,980 --> 00:08:11,470
like GPT-4.0, to the next generation GPT-5, the agent just starts pulling away in

212
00:08:11,470 --> 00:08:11,500
its

213
00:08:11,500 --> 00:08:14,340
performance on these complex computer tasks.

214
00:08:14,340 --> 00:08:18,300
So the bottleneck isn't the container or the SDK, it's the raw intelligence of the

215
00:08:18,300 --> 00:08:19,300
LLM.

216
00:08:19,300 --> 00:08:20,300
Exactly.

217
00:08:20,300 --> 00:08:22,980
The infrastructure enables, but the model has to execute.

218
00:08:22,980 --> 00:08:26,920
And speaking of execution, being able to run these powerful agents locally is a

219
00:08:26,920 --> 00:08:27,540
total game

220
00:08:27,540 --> 00:08:30,020
changer for privacy and for speed.

221
00:08:30,020 --> 00:08:31,420
Tell me about that.

222
00:08:31,420 --> 00:08:37,180
Well, CUA introduces these optimized local agents, like uiTars 1.57b 6-bit.

223
00:08:37,180 --> 00:08:41,200
The fact that models this good can run natively and really efficiently on Apple

224
00:08:41,200 --> 00:08:42,020
Silicon with

225
00:08:42,020 --> 00:08:43,060
MLX.

226
00:08:43,060 --> 00:08:46,420
It means the future of local AI agents isn't something we're waiting for.

227
00:08:46,420 --> 00:08:47,980
We're building with it right now.

228
00:08:47,980 --> 00:08:51,780
One of the coolest concepts I saw was this idea of composed agents.

229
00:08:51,780 --> 00:08:55,680
It's not about one giant AI doing everything, but more like a team of specialists.

230
00:08:55,680 --> 00:08:56,860
Can you break that down?

231
00:08:56,860 --> 00:08:58,780
It's basically a division of labor for AI.

232
00:08:58,780 --> 00:09:02,020
I mean, why use two brains instead of one?

233
00:09:02,020 --> 00:09:04,900
Because a complex task really has two parts.

234
00:09:04,900 --> 00:09:06,980
You have perception and then you have reasoning.

235
00:09:06,980 --> 00:09:07,980
Okay.

236
00:09:07,980 --> 00:09:11,570
So a composed agent combines a specialized vision language model or a VLM,

237
00:09:11,570 --> 00:09:12,180
something

238
00:09:12,180 --> 00:09:15,580
like Moon Dream 3, whose only job is to understand the screen.

239
00:09:15,580 --> 00:09:18,280
It captions what it sees and finds where things are.

240
00:09:18,280 --> 00:09:20,300
It figures out where the checkout button is.

241
00:09:20,300 --> 00:09:22,740
So the VLM is the eyes that finds the coordinates.

242
00:09:22,740 --> 00:09:23,740
Exactly.

243
00:09:23,740 --> 00:09:28,340
It does the visual part, which is faster and cheaper for a specialized model.

244
00:09:28,340 --> 00:09:34,160
Then you feed that visual context plus the main goal to a separate, really powerful

245
00:09:34,160 --> 00:09:34,720
LLM.

246
00:09:34,720 --> 00:09:36,300
And the LLM does what?

247
00:09:36,300 --> 00:09:38,580
The LLM handles the complex reasoning.

248
00:09:38,580 --> 00:09:42,360
It decides the next strategic step or writes a bit of code or changes the plan

249
00:09:42,360 --> 00:09:42,800
based on

250
00:09:42,800 --> 00:09:43,800
what's happening.

251
00:09:43,800 --> 00:09:47,960
By splitting up the roles, the whole agent becomes faster and much more reliable.

252
00:09:47,960 --> 00:09:51,260
And that leads right into this new focus on specialization.

253
00:09:51,260 --> 00:09:55,020
The goal isn't one super agent, but maybe a whole team of smaller agents working at

254
00:09:55,020 --> 00:09:56,020
the same time.

255
00:09:56,020 --> 00:09:57,780
That's where the field is heading.

256
00:09:57,780 --> 00:10:00,960
Instead of one agent trying to do everything, you deploy a whole fleet of them,

257
00:10:00,960 --> 00:10:01,620
each focused

258
00:10:01,620 --> 00:10:04,220
on its own app or its own narrow task.

259
00:10:04,220 --> 00:10:06,740
Like an agent just for my iPhone mirroring app.

260
00:10:06,740 --> 00:10:07,740
For instance, yeah.

261
00:10:07,740 --> 00:10:08,740
Right.

262
00:10:08,740 --> 00:10:12,100
Or one agent that's an expert at writing code in an IDE and another one that's an

263
00:10:12,100 --> 00:10:12,420
expert

264
00:10:12,420 --> 00:10:15,260
at taking those results and making a PowerPoint deck.

265
00:10:15,260 --> 00:10:21,020
That specialization is what will make really complex multi-app workflows a reality.

266
00:10:21,020 --> 00:10:26,120
So we know CUA provides the safe container and the SDKs provide the brain.

267
00:10:26,120 --> 00:10:29,280
How does the industry actually measure if these agents are any good?

268
00:10:29,280 --> 00:10:32,120
We need real metrics for this to be adopted by businesses.

269
00:10:32,120 --> 00:10:33,120
Right.

270
00:10:33,120 --> 00:10:35,740
You can't build a reliable product on just anecdotes.

271
00:10:35,740 --> 00:10:38,940
This brings us to the last part, scale and benchmarks.

272
00:10:38,940 --> 00:10:42,300
CUA has built in tools for measuring agent performance.

273
00:10:42,300 --> 00:10:43,300
What are they using?

274
00:10:43,300 --> 00:10:46,820
They rely on standardized benchmarks, like Audis World Verified, which tests

275
00:10:46,820 --> 00:10:47,260
general

276
00:10:47,260 --> 00:10:52,180
computer skills, and SheetBench V2, which is all about how well an agent can handle

277
00:10:52,180 --> 00:10:54,440
spreadsheets and data analysis.

278
00:10:54,440 --> 00:10:56,720
So these benchmarks create a common standard.

279
00:10:56,720 --> 00:10:59,940
They let developers compare their agent to the state of the art and know where they

280
00:10:59,940 --> 00:11:00,400
stand.

281
00:11:00,400 --> 00:11:01,400
Precisely.

282
00:11:01,400 --> 00:11:06,890
And this shift from, you know, hey, it worked for me once, to standardized verified

283
00:11:06,890 --> 00:11:07,300
benchmarks

284
00:11:07,300 --> 00:11:09,000
is a huge deal.

285
00:11:09,000 --> 00:11:12,760
It's how teams get funding, how they prove their agent is reliable.

286
00:11:12,760 --> 00:11:17,100
They also use it to test specific models against each other, like benchmarking Moondream

287
00:11:17,100 --> 00:11:17,360
3

288
00:11:17,360 --> 00:11:19,040
against another vision model.

289
00:11:19,040 --> 00:11:20,880
It's all about data-driven improvement.

290
00:11:20,880 --> 00:11:24,400
And what are they tracking beyond just, you know, did it work or did it fail?

291
00:11:24,400 --> 00:11:26,840
The metrics you'd need for production.

292
00:11:26,840 --> 00:11:31,600
Success rate, of course, but also average time it took to finish, resource use.

293
00:11:31,600 --> 00:11:37,360
And crucially, CUA has tools for A-B testing different configurations.

294
00:11:37,360 --> 00:11:40,600
So not just testing models, but testing the prompts you give them.

295
00:11:40,600 --> 00:11:44,460
Prompts, memory settings, how much of the screen the agent can see at once, all

296
00:11:44,460 --> 00:11:44,580
those

297
00:11:44,580 --> 00:11:45,800
little variables.

298
00:11:45,800 --> 00:11:49,910
This constant evaluation is what takes AI from being a cool experiment to being

299
00:11:49,910 --> 00:11:50,480
reliable

300
00:11:50,480 --> 00:11:51,620
automation.

301
00:11:51,620 --> 00:11:53,560
This raises an important question, though.

302
00:11:53,560 --> 00:11:57,720
Once you validated an agent, how fast can you actually deploy it and scale it up?

303
00:11:57,720 --> 00:12:00,880
And it seems like CUA offers a couple of different paths for that, depending on

304
00:12:00,880 --> 00:12:01,980
what you prioritize.

305
00:12:01,980 --> 00:12:03,840
That's the strategic choice they give you.

306
00:12:03,840 --> 00:12:08,480
For developers who want total control and want to self-host their own environments,

307
00:12:08,480 --> 00:12:10,400
there's the local open source option.

308
00:12:10,400 --> 00:12:11,400
It's free.

309
00:12:11,400 --> 00:12:12,400
It's highly customizable.

310
00:12:12,400 --> 00:12:17,370
But what about the big companies that just need massive scale, like yesterday, and

311
00:12:17,370 --> 00:12:17,800
don't

312
00:12:17,800 --> 00:12:20,880
want the headache of managing all those sandboxes?

313
00:12:20,880 --> 00:12:23,320
That's where the cloud pro-enterprise model comes in.

314
00:12:23,320 --> 00:12:27,940
It provides cloud-powered sandboxes that you access through a simple API.

315
00:12:27,940 --> 00:12:33,440
The big benefits there are unlimited scale, instant access to cross-platform

316
00:12:33,440 --> 00:12:34,720
environments,

317
00:12:34,720 --> 00:12:37,040
and a pay-as-you-go billing model.

318
00:12:37,040 --> 00:12:39,990
It's really designed to just get rid of the infrastructure problem, so developers

319
00:12:39,990 --> 00:12:40,520
can focus

320
00:12:40,520 --> 00:12:42,840
only on the agent's intelligence.

321
00:12:42,840 --> 00:12:45,690
If we connect this to the bigger picture, CUA is the crucial piece of

322
00:12:45,690 --> 00:12:46,400
infrastructure

323
00:12:46,400 --> 00:12:51,120
that just handles all the messiness of interacting with an operating system.

324
00:12:51,120 --> 00:12:52,880
It's the hammer, as one source put it.

325
00:12:52,880 --> 00:12:55,920
Yeah, they said, when you hold a hammer, everything looks like a nail.

326
00:12:55,920 --> 00:12:58,120
The CUA team is giving you the damn hammer.

327
00:12:58,120 --> 00:12:59,360
Go nail it.

328
00:12:59,360 --> 00:13:03,480
This crane work really turns the abstract idea of an intelligent agent into a real

329
00:13:03,480 --> 00:13:04,080
functional

330
00:13:04,080 --> 00:13:06,960
system that can safely control your digital world.

331
00:13:06,960 --> 00:13:10,800
It lets you stop worrying about the infrastructure and start designing the outcome.

332
00:13:10,800 --> 00:13:12,560
So what does this all mean?

333
00:13:12,560 --> 00:13:17,100
The sources are pretty clear that the era of secure, reliable computer-use AI

334
00:13:17,100 --> 00:13:17,520
agents

335
00:13:17,520 --> 00:13:18,520
is.

336
00:13:18,520 --> 00:13:19,520
It's not coming.

337
00:13:19,520 --> 00:13:20,520
It's here now.

338
00:13:20,520 --> 00:13:24,010
It's changing the whole desktop experience into something that's automated and

339
00:13:24,010 --> 00:13:24,700
adaptable.

340
00:13:24,700 --> 00:13:28,180
We're moving from telling our computers what to do to basically collaborating with

341
00:13:28,180 --> 00:13:28,720
an autonomous

342
00:13:28,720 --> 00:13:29,720
partner.

343
00:13:29,720 --> 00:13:32,680
And here's a final provocative thought for you to chew on.

344
00:13:32,680 --> 00:13:36,550
AI researchers are already trying to predict the timeline for when AI will reach

345
00:13:36,550 --> 00:13:37,200
human-level

346
00:13:37,200 --> 00:13:40,400
skills on that all-in-a-mess world benchmark we mentioned.

347
00:13:40,400 --> 00:13:44,660
So think about your most common, your most repetitive computer tasks pulling data,

348
00:13:44,660 --> 00:13:45,080
making

349
00:13:45,080 --> 00:13:47,040
reports, adjusting designs.

350
00:13:47,040 --> 00:13:50,830
How soon will an agent have a 9 in 10 chance of doing that task better, more

351
00:13:50,830 --> 00:13:51,500
reliably,

352
00:13:51,500 --> 00:13:53,400
and way faster than you can?

353
00:13:53,400 --> 00:13:55,520
Our deep dive today was brought to you by SafeServer.

354
00:13:55,520 --> 00:13:58,680
If you are looking for reliable hosting and support for your digital transformation,

355
00:13:58,680 --> 00:13:58,880
visit

356
00:13:58,880 --> 00:14:02,160
www.safeserver.de.

357
00:14:02,160 --> 00:14:03,160
Thank you for diving deep with us.

358
00:14:03,160 --> 00:14:03,160
We'll see you next time.