1 00:00:00,000 --> 00:00:03,800 Welcome to the Deep Dive, where we tear through the latest research to make sure 2 00:00:03,800 --> 00:00:04,600 you, the 3 00:00:04,600 --> 00:00:08,680 learner, get the critical knowledge without all the jargon. 4 00:00:08,680 --> 00:00:12,760 And today, we're looking at something that I think really is a fundamental shift. 5 00:00:12,760 --> 00:00:13,760 I agree. 6 00:00:13,760 --> 00:00:17,040 We are not talking about your usual chatbot today. 7 00:00:17,040 --> 00:00:21,780 We're tackling a concept that sounds like science fiction, AI that doesn't just 8 00:00:21,780 --> 00:00:22,020 talk 9 00:00:22,020 --> 00:00:24,200 to you, but actually works for you. 10 00:00:24,200 --> 00:00:28,380 I mean, an AI that can see your screen, navigate your apps, click buttons, type 11 00:00:28,380 --> 00:00:29,920 things out. 12 00:00:29,920 --> 00:00:33,840 Basically, do what a human operator does, but do it safely. 13 00:00:33,840 --> 00:00:35,720 And that's the key transition point. 14 00:00:35,720 --> 00:00:38,200 For years, this was all theoretical. 15 00:00:38,200 --> 00:00:42,820 But now, with the right kind of infrastructure, these agents can finally run these 16 00:00:42,820 --> 00:00:43,360 complex 17 00:00:43,360 --> 00:00:45,480 workflows across a bunch of different apps. 18 00:00:45,480 --> 00:00:47,320 Like what are we talking about here? 19 00:00:47,320 --> 00:00:48,320 Oh, anything. 20 00:00:48,320 --> 00:00:52,650 Editing an image in Photoshop, handling a checkout on Amazon, or even filing a 21 00:00:52,650 --> 00:00:53,160 complex 22 00:00:53,160 --> 00:00:55,760 report that uses three different company tools. 23 00:00:55,760 --> 00:00:58,920 And the piece of technology that infrastructure making this all possible is our 24 00:00:58,920 --> 00:00:59,840 focus today. 25 00:00:59,840 --> 00:01:02,320 Containers for computer use agents. 26 00:01:02,320 --> 00:01:04,040 Most people just call it CUA. 27 00:01:04,040 --> 00:01:07,000 It's the security layer, the scaling framework. 28 00:01:07,000 --> 00:01:11,040 It's what gets these powerful AIs out of the lab and securely onto your desktop. 29 00:01:11,040 --> 00:01:18,670 So our mission today is to unpack how we go from a basic command line AI to these 30 00:01:18,670 --> 00:01:19,400 agents 31 00:01:19,400 --> 00:01:24,160 that can control entire operating systems, Mac OS, Linux, Windows, and do it 32 00:01:24,160 --> 00:01:24,720 without 33 00:01:24,720 --> 00:01:25,720 breaking anything. 34 00:01:25,720 --> 00:01:27,400 OK, let's unpack this. 35 00:01:27,400 --> 00:01:32,120 But first, a quick note, this deep dive is supported by Safe Server. 36 00:01:32,120 --> 00:01:35,630 Safe Server manages the hosting for this exact kind of cutting-edge software, and 37 00:01:35,630 --> 00:01:36,080 they can 38 00:01:36,080 --> 00:01:38,400 support your digital transformation needs. 39 00:01:38,400 --> 00:01:41,850 So if you're looking for reliable hosting that can handle this next generation of 40 00:01:41,850 --> 00:01:42,540 computing, 41 00:01:42,540 --> 00:01:44,840 you can find out more at www.safeserver.de. 42 00:01:44,840 --> 00:01:49,200 All right, so to start, let's just get a really clear definition down. 43 00:01:49,200 --> 00:01:50,480 What is a computer use agent? 44 00:01:50,480 --> 00:01:52,080 Yeah, let's ground ourselves. 45 00:01:52,080 --> 00:01:57,570 It's an AI designed to do tasks by observing and interacting with a normal desktop 46 00:01:57,570 --> 00:01:58,240 environment. 47 00:01:58,240 --> 00:02:00,520 Just think of it like I said, digitalized in a robotic hand. 48 00:02:00,520 --> 00:02:04,920 OK, so it's using simulated mouse clicks and keyboard commands to get things done. 49 00:02:04,920 --> 00:02:05,920 Exactly. 50 00:02:05,920 --> 00:02:07,720 Things that usually need a human watching over them. 51 00:02:07,720 --> 00:02:09,360 That sounds incredibly powerful. 52 00:02:09,360 --> 00:02:13,120 I mean, it unlocks basically every piece of software that already exists. 53 00:02:13,120 --> 00:02:17,100 But the second you say you're giving an experimental AI the ability to click and 54 00:02:17,100 --> 00:02:18,240 type on my machine, 55 00:02:18,240 --> 00:02:21,840 a huge alarm bell just starts ringing in my head. 56 00:02:21,840 --> 00:02:22,840 It should. 57 00:02:22,840 --> 00:02:26,000 What's to stop it from just, you know, doing real damage? 58 00:02:26,000 --> 00:02:30,520 And that is the immediate non-negotiable problem that CUA was built to solve. 59 00:02:30,520 --> 00:02:32,680 The sources really emphasize this. 60 00:02:32,680 --> 00:02:37,450 Running these powerful, sometimes unpredictable agents locally is just, it's 61 00:02:37,450 --> 00:02:39,320 dangerous. 62 00:02:39,320 --> 00:02:40,320 How dangerous? 63 00:02:40,320 --> 00:02:44,480 There's this one anecdote that gets shared a lot from the early days of development. 64 00:02:44,480 --> 00:02:48,330 One of the teams had an agent set up that, and this is a quote, broke my computer, 65 00:02:48,330 --> 00:02:48,840 preventing 66 00:02:48,840 --> 00:02:50,400 disk writing. 67 00:02:50,400 --> 00:02:53,840 That's, that is a developer's absolute worst nightmare. 68 00:02:53,840 --> 00:02:57,720 So you've got a super smart agent, but you can't trust it not to just brick your 69 00:02:57,720 --> 00:02:58,120 whole 70 00:02:58,120 --> 00:02:59,120 system. 71 00:02:59,120 --> 00:03:00,120 Precisely. 72 00:03:00,120 --> 00:03:03,760 You can't just hand over the keys to your entire operating system to a tool that is, 73 00:03:03,760 --> 00:03:05,280 at its core, experimental. 74 00:03:05,280 --> 00:03:08,870 And what's fascinating here is that the community saw this risk immediately and 75 00:03:08,870 --> 00:03:09,560 demanded some 76 00:03:09,560 --> 00:03:10,840 kind of containment. 77 00:03:10,840 --> 00:03:12,720 CUA provides that safety. 78 00:03:12,720 --> 00:03:16,340 And the source material has a great analogy for it, which is CUA is effectively 79 00:03:16,340 --> 00:03:16,760 Docker 80 00:03:16,760 --> 00:03:18,440 for computer use agents. 81 00:03:18,440 --> 00:03:23,020 I love that analogy, but Docker is usually for, you know, server stuff, right? 82 00:03:23,020 --> 00:03:25,840 Processes without a graphical interface. 83 00:03:25,840 --> 00:03:29,760 Virtualizing a full desktop with a mouse and windows and video inside a container 84 00:03:29,760 --> 00:03:30,200 sounds 85 00:03:30,200 --> 00:03:31,200 way harder. 86 00:03:31,200 --> 00:03:32,200 It is much harder. 87 00:03:32,200 --> 00:03:33,560 And that's the key distinction. 88 00:03:33,560 --> 00:03:35,760 CUA doesn't just isolate a process. 89 00:03:35,760 --> 00:03:38,200 It isolates the entire interactive environment. 90 00:03:38,200 --> 00:03:43,690 Lets these agents control a full operating system, Mac OS, Linux, Windows, inside a 91 00:03:43,690 --> 00:03:44,040 secure 92 00:03:44,040 --> 00:03:45,160 virtual machine. 93 00:03:45,160 --> 00:03:46,840 So the isolation is total. 94 00:03:46,840 --> 00:03:47,840 Comprehensive. 95 00:03:47,840 --> 00:03:51,200 It makes sure the agent's actions stay inside that contain VM. 96 00:03:51,200 --> 00:03:52,740 So no data leaks out. 97 00:03:52,740 --> 00:03:55,780 And critically, the agent can't damage your main system. 98 00:03:55,780 --> 00:03:59,070 It's the framework that makes the whole idea of an intelligent agent actually 99 00:03:59,070 --> 00:03:59,600 functional 100 00:03:59,600 --> 00:04:00,600 and reliable. 101 00:04:00,600 --> 00:04:01,600 OK. 102 00:04:01,600 --> 00:04:07,090 So CUA is the foundation that makes this whole leap in human-computer interaction 103 00:04:07,090 --> 00:04:07,640 safe. 104 00:04:07,640 --> 00:04:11,730 It lets us build tools that really adapt to us, which should make technology feel 105 00:04:11,730 --> 00:04:12,040 more 106 00:04:12,040 --> 00:04:13,040 intuitive. 107 00:04:13,040 --> 00:04:15,020 So let's get into the how. 108 00:04:15,020 --> 00:04:19,320 How does the architecture pull off this mix of security and, I assume, high 109 00:04:19,320 --> 00:04:20,360 performance? 110 00:04:20,360 --> 00:04:21,360 Right. 111 00:04:21,360 --> 00:04:22,360 The architecture. 112 00:04:22,360 --> 00:04:25,640 So security is handled through what they call local sandboxes. 113 00:04:25,640 --> 00:04:29,200 Every single thing the agent does runs in this isolated environment, whether it's a 114 00:04:29,200 --> 00:04:30,940 full VM or a container. 115 00:04:30,940 --> 00:04:33,440 And that just guarantees privacy and security. 116 00:04:33,440 --> 00:04:34,440 It does. 117 00:04:34,440 --> 00:04:38,130 For instance, if the agent messes up and deletes a system file, that action is 118 00:04:38,130 --> 00:04:39,220 completely confined 119 00:04:39,220 --> 00:04:40,220 to the virtual world. 120 00:04:40,220 --> 00:04:42,120 Your actual computer is totally untouched. 121 00:04:42,120 --> 00:04:46,260 And the sources mentioned a specific performance thing, an optimization, that 122 00:04:46,260 --> 00:04:47,000 seemed really 123 00:04:47,000 --> 00:04:49,840 aimed at developers using the newest hardware. 124 00:04:49,840 --> 00:04:50,840 That's right. 125 00:04:50,840 --> 00:04:53,800 CUA is highly, highly optimized for Apple Silicon. 126 00:04:53,800 --> 00:04:57,200 This is a really critical design choice for developers who are doing a lot of local 127 00:04:57,200 --> 00:04:57,880 testing. 128 00:04:57,880 --> 00:05:02,920 Because by taking advantage of the M series chips, CUA gets, and I'm quoting here, 129 00:05:02,920 --> 00:05:03,500 blazing 130 00:05:03,500 --> 00:05:06,660 fast performance and energy efficiency. 131 00:05:06,660 --> 00:05:11,580 You can run the agent and its simulated computer at the same time on one laptop. 132 00:05:11,580 --> 00:05:12,580 Really fast. 133 00:05:12,580 --> 00:05:16,180 That makes total sense for a developer testing on their own machine. 134 00:05:16,180 --> 00:05:19,300 But it does bring up a pretty big question for bigger companies. 135 00:05:19,300 --> 00:05:23,550 If you optimize so heavily for Apple Silicon, don't you create a kind of vendor 136 00:05:23,550 --> 00:05:24,080 lock-in 137 00:05:24,080 --> 00:05:29,340 problem for enterprises that use, you know, huge Linux or Windows server farms? 138 00:05:29,340 --> 00:05:33,440 That is an excellent and a really critical question. 139 00:05:33,440 --> 00:05:37,550 The way the framework handles this is by using that optimization as a development 140 00:05:37,550 --> 00:05:38,160 booster, 141 00:05:38,160 --> 00:05:39,560 but building the system to be cross-platform. 142 00:05:39,560 --> 00:05:40,560 OK. 143 00:05:40,560 --> 00:05:41,560 So how does that work? 144 00:05:41,560 --> 00:05:45,420 Well, under the hood, it combines a super optimized Mac OS virtual machine with a 145 00:05:45,420 --> 00:05:45,700 more 146 00:05:45,700 --> 00:05:48,180 generic Python control interface. 147 00:05:48,180 --> 00:05:51,960 So while your local development is extra fast on an M series Mac, the framework 148 00:05:51,960 --> 00:05:52,540 itself is 149 00:05:52,540 --> 00:05:57,080 designed to manage environments for Mac OS, Linux using Docker, and Windows using 150 00:05:57,080 --> 00:05:57,260 its 151 00:05:57,260 --> 00:05:58,260 own sandboxes. 152 00:05:58,260 --> 00:06:01,350 Ah, so you can prototype really fast locally and deploy it to whatever cloud 153 00:06:01,350 --> 00:06:02,000 environment 154 00:06:02,000 --> 00:06:03,000 you need. 155 00:06:03,000 --> 00:06:04,000 Exactly. 156 00:06:04,000 --> 00:06:05,000 That makes sense. 157 00:06:05,000 --> 00:06:07,490 But for a developer, that still sounds like a lot to manage, VMs, containers, 158 00:06:07,490 --> 00:06:07,940 different 159 00:06:07,940 --> 00:06:11,920 operating systems, Python, how do you interact with all of that without just 160 00:06:11,920 --> 00:06:12,660 spending all 161 00:06:12,660 --> 00:06:14,380 your time on infrastructure? 162 00:06:14,380 --> 00:06:17,220 That is the whole job of the computer SDK. 163 00:06:17,220 --> 00:06:22,230 You should think of the computer SDK as like the unified robotic hand that controls 164 00:06:22,230 --> 00:06:23,020 everything 165 00:06:23,020 --> 00:06:24,680 in that virtual environment. 166 00:06:24,680 --> 00:06:25,680 It hides all that complexity. 167 00:06:25,680 --> 00:06:30,060 So I don't have to worry if I'm talking to a Linux container or a Mac OS VM. 168 00:06:30,060 --> 00:06:31,060 Nope. 169 00:06:31,060 --> 00:06:33,340 You just use one consistent, simple Python API. 170 00:06:33,340 --> 00:06:34,340 Yeah. 171 00:06:34,340 --> 00:06:36,220 It's actually a lot like Pyatokobi if you've ever used that. 172 00:06:36,220 --> 00:06:40,510 Oh, okay. So if I know how to automate a simple click with a tool like that, the 173 00:06:40,510 --> 00:06:40,860 computer 174 00:06:40,860 --> 00:06:42,380 SDK lets me do the same thing. 175 00:06:42,380 --> 00:06:46,580 Click, type, scroll, but safely inside the monitored CUA container. 176 00:06:46,580 --> 00:06:47,580 Precisely. 177 00:06:47,580 --> 00:06:51,700 And then to manage the actual human-to-AI conversation, the sources highlighted the 178 00:06:51,700 --> 00:06:53,420 AI Gradio integration. 179 00:06:53,420 --> 00:06:54,420 What's Gradio? 180 00:06:54,420 --> 00:06:57,890 It's the simple web interface that translates your plain English request like, hey, 181 00:06:57,890 --> 00:06:58,340 analyze 182 00:06:58,340 --> 00:07:02,400 the Q4 sales figures in this spreadsheet into actions that the agent can actually 183 00:07:02,400 --> 00:07:02,980 execute. 184 00:07:02,980 --> 00:07:05,380 It makes the whole loop incredibly smooth. 185 00:07:05,380 --> 00:07:08,020 OK, so we've got the secure box to run it in. 186 00:07:08,020 --> 00:07:11,660 Now let's talk about the brain that goes inside it, the intelligence layer. 187 00:07:11,660 --> 00:07:13,060 What is the Agent SDK? 188 00:07:13,060 --> 00:07:17,360 The Agent SDK is what you use to actually build the intelligence, the decision 189 00:07:17,360 --> 00:07:17,900 maker, 190 00:07:17,900 --> 00:07:19,780 that runs on those CUA computers. 191 00:07:19,780 --> 00:07:23,300 Sort of the brain plug-in, it gives you a consistent way to run the language models 192 00:07:23,300 --> 00:07:24,300 themselves. 193 00:07:24,300 --> 00:07:26,660 And it seems like flexibility is key here. 194 00:07:26,660 --> 00:07:31,320 It supports a huge range of models, from the giant cloud ones to smaller open 195 00:07:31,320 --> 00:07:32,140 weight ones 196 00:07:32,140 --> 00:07:33,500 you can run on your own machine. 197 00:07:33,500 --> 00:07:38,430 It does, and a developer can switch between them just by changing a prefix in the 198 00:07:38,430 --> 00:07:39,000 code. 199 00:07:39,000 --> 00:07:43,300 If you want the power of OpenEye, or a cheaper open source option through Elama, or 200 00:07:43,300 --> 00:07:43,660 a super 201 00:07:43,660 --> 00:07:46,940 optimized local one with MLX, the SEK just handles it. 202 00:07:46,940 --> 00:07:49,950 Okay, here's where it gets really interesting for me, because now we can actually 203 00:07:49,950 --> 00:07:50,380 measure 204 00:07:50,380 --> 00:07:53,560 how intelligent these things are in a real world setting. 205 00:07:53,560 --> 00:07:57,160 What kind of a performance jump do we see when these agents get to use the best 206 00:07:57,160 --> 00:07:57,620 models 207 00:07:57,620 --> 00:07:58,620 out there? 208 00:07:58,620 --> 00:08:01,220 The sources gave a really compelling preview of this. 209 00:08:01,220 --> 00:08:05,320 They show that when you swap the main reasoning model from something already very 210 00:08:05,320 --> 00:08:05,980 capable, 211 00:08:05,980 --> 00:08:11,470 like GPT-4.0, to the next generation GPT-5, the agent just starts pulling away in 212 00:08:11,470 --> 00:08:11,500 its 213 00:08:11,500 --> 00:08:14,340 performance on these complex computer tasks. 214 00:08:14,340 --> 00:08:18,300 So the bottleneck isn't the container or the SDK, it's the raw intelligence of the 215 00:08:18,300 --> 00:08:19,300 LLM. 216 00:08:19,300 --> 00:08:20,300 Exactly. 217 00:08:20,300 --> 00:08:22,980 The infrastructure enables, but the model has to execute. 218 00:08:22,980 --> 00:08:26,920 And speaking of execution, being able to run these powerful agents locally is a 219 00:08:26,920 --> 00:08:27,540 total game 220 00:08:27,540 --> 00:08:30,020 changer for privacy and for speed. 221 00:08:30,020 --> 00:08:31,420 Tell me about that. 222 00:08:31,420 --> 00:08:37,180 Well, CUA introduces these optimized local agents, like uiTars 1.57b 6-bit. 223 00:08:37,180 --> 00:08:41,200 The fact that models this good can run natively and really efficiently on Apple 224 00:08:41,200 --> 00:08:42,020 Silicon with 225 00:08:42,020 --> 00:08:43,060 MLX. 226 00:08:43,060 --> 00:08:46,420 It means the future of local AI agents isn't something we're waiting for. 227 00:08:46,420 --> 00:08:47,980 We're building with it right now. 228 00:08:47,980 --> 00:08:51,780 One of the coolest concepts I saw was this idea of composed agents. 229 00:08:51,780 --> 00:08:55,680 It's not about one giant AI doing everything, but more like a team of specialists. 230 00:08:55,680 --> 00:08:56,860 Can you break that down? 231 00:08:56,860 --> 00:08:58,780 It's basically a division of labor for AI. 232 00:08:58,780 --> 00:09:02,020 I mean, why use two brains instead of one? 233 00:09:02,020 --> 00:09:04,900 Because a complex task really has two parts. 234 00:09:04,900 --> 00:09:06,980 You have perception and then you have reasoning. 235 00:09:06,980 --> 00:09:07,980 Okay. 236 00:09:07,980 --> 00:09:11,570 So a composed agent combines a specialized vision language model or a VLM, 237 00:09:11,570 --> 00:09:12,180 something 238 00:09:12,180 --> 00:09:15,580 like Moon Dream 3, whose only job is to understand the screen. 239 00:09:15,580 --> 00:09:18,280 It captions what it sees and finds where things are. 240 00:09:18,280 --> 00:09:20,300 It figures out where the checkout button is. 241 00:09:20,300 --> 00:09:22,740 So the VLM is the eyes that finds the coordinates. 242 00:09:22,740 --> 00:09:23,740 Exactly. 243 00:09:23,740 --> 00:09:28,340 It does the visual part, which is faster and cheaper for a specialized model. 244 00:09:28,340 --> 00:09:34,160 Then you feed that visual context plus the main goal to a separate, really powerful 245 00:09:34,160 --> 00:09:34,720 LLM. 246 00:09:34,720 --> 00:09:36,300 And the LLM does what? 247 00:09:36,300 --> 00:09:38,580 The LLM handles the complex reasoning. 248 00:09:38,580 --> 00:09:42,360 It decides the next strategic step or writes a bit of code or changes the plan 249 00:09:42,360 --> 00:09:42,800 based on 250 00:09:42,800 --> 00:09:43,800 what's happening. 251 00:09:43,800 --> 00:09:47,960 By splitting up the roles, the whole agent becomes faster and much more reliable. 252 00:09:47,960 --> 00:09:51,260 And that leads right into this new focus on specialization. 253 00:09:51,260 --> 00:09:55,020 The goal isn't one super agent, but maybe a whole team of smaller agents working at 254 00:09:55,020 --> 00:09:56,020 the same time. 255 00:09:56,020 --> 00:09:57,780 That's where the field is heading. 256 00:09:57,780 --> 00:10:00,960 Instead of one agent trying to do everything, you deploy a whole fleet of them, 257 00:10:00,960 --> 00:10:01,620 each focused 258 00:10:01,620 --> 00:10:04,220 on its own app or its own narrow task. 259 00:10:04,220 --> 00:10:06,740 Like an agent just for my iPhone mirroring app. 260 00:10:06,740 --> 00:10:07,740 For instance, yeah. 261 00:10:07,740 --> 00:10:08,740 Right. 262 00:10:08,740 --> 00:10:12,100 Or one agent that's an expert at writing code in an IDE and another one that's an 263 00:10:12,100 --> 00:10:12,420 expert 264 00:10:12,420 --> 00:10:15,260 at taking those results and making a PowerPoint deck. 265 00:10:15,260 --> 00:10:21,020 That specialization is what will make really complex multi-app workflows a reality. 266 00:10:21,020 --> 00:10:26,120 So we know CUA provides the safe container and the SDKs provide the brain. 267 00:10:26,120 --> 00:10:29,280 How does the industry actually measure if these agents are any good? 268 00:10:29,280 --> 00:10:32,120 We need real metrics for this to be adopted by businesses. 269 00:10:32,120 --> 00:10:33,120 Right. 270 00:10:33,120 --> 00:10:35,740 You can't build a reliable product on just anecdotes. 271 00:10:35,740 --> 00:10:38,940 This brings us to the last part, scale and benchmarks. 272 00:10:38,940 --> 00:10:42,300 CUA has built in tools for measuring agent performance. 273 00:10:42,300 --> 00:10:43,300 What are they using? 274 00:10:43,300 --> 00:10:46,820 They rely on standardized benchmarks, like Audis World Verified, which tests 275 00:10:46,820 --> 00:10:47,260 general 276 00:10:47,260 --> 00:10:52,180 computer skills, and SheetBench V2, which is all about how well an agent can handle 277 00:10:52,180 --> 00:10:54,440 spreadsheets and data analysis. 278 00:10:54,440 --> 00:10:56,720 So these benchmarks create a common standard. 279 00:10:56,720 --> 00:10:59,940 They let developers compare their agent to the state of the art and know where they 280 00:10:59,940 --> 00:11:00,400 stand. 281 00:11:00,400 --> 00:11:01,400 Precisely. 282 00:11:01,400 --> 00:11:06,890 And this shift from, you know, hey, it worked for me once, to standardized verified 283 00:11:06,890 --> 00:11:07,300 benchmarks 284 00:11:07,300 --> 00:11:09,000 is a huge deal. 285 00:11:09,000 --> 00:11:12,760 It's how teams get funding, how they prove their agent is reliable. 286 00:11:12,760 --> 00:11:17,100 They also use it to test specific models against each other, like benchmarking Moondream 287 00:11:17,100 --> 00:11:17,360 3 288 00:11:17,360 --> 00:11:19,040 against another vision model. 289 00:11:19,040 --> 00:11:20,880 It's all about data-driven improvement. 290 00:11:20,880 --> 00:11:24,400 And what are they tracking beyond just, you know, did it work or did it fail? 291 00:11:24,400 --> 00:11:26,840 The metrics you'd need for production. 292 00:11:26,840 --> 00:11:31,600 Success rate, of course, but also average time it took to finish, resource use. 293 00:11:31,600 --> 00:11:37,360 And crucially, CUA has tools for A-B testing different configurations. 294 00:11:37,360 --> 00:11:40,600 So not just testing models, but testing the prompts you give them. 295 00:11:40,600 --> 00:11:44,460 Prompts, memory settings, how much of the screen the agent can see at once, all 296 00:11:44,460 --> 00:11:44,580 those 297 00:11:44,580 --> 00:11:45,800 little variables. 298 00:11:45,800 --> 00:11:49,910 This constant evaluation is what takes AI from being a cool experiment to being 299 00:11:49,910 --> 00:11:50,480 reliable 300 00:11:50,480 --> 00:11:51,620 automation. 301 00:11:51,620 --> 00:11:53,560 This raises an important question, though. 302 00:11:53,560 --> 00:11:57,720 Once you validated an agent, how fast can you actually deploy it and scale it up? 303 00:11:57,720 --> 00:12:00,880 And it seems like CUA offers a couple of different paths for that, depending on 304 00:12:00,880 --> 00:12:01,980 what you prioritize. 305 00:12:01,980 --> 00:12:03,840 That's the strategic choice they give you. 306 00:12:03,840 --> 00:12:08,480 For developers who want total control and want to self-host their own environments, 307 00:12:08,480 --> 00:12:10,400 there's the local open source option. 308 00:12:10,400 --> 00:12:11,400 It's free. 309 00:12:11,400 --> 00:12:12,400 It's highly customizable. 310 00:12:12,400 --> 00:12:17,370 But what about the big companies that just need massive scale, like yesterday, and 311 00:12:17,370 --> 00:12:17,800 don't 312 00:12:17,800 --> 00:12:20,880 want the headache of managing all those sandboxes? 313 00:12:20,880 --> 00:12:23,320 That's where the cloud pro-enterprise model comes in. 314 00:12:23,320 --> 00:12:27,940 It provides cloud-powered sandboxes that you access through a simple API. 315 00:12:27,940 --> 00:12:33,440 The big benefits there are unlimited scale, instant access to cross-platform 316 00:12:33,440 --> 00:12:34,720 environments, 317 00:12:34,720 --> 00:12:37,040 and a pay-as-you-go billing model. 318 00:12:37,040 --> 00:12:39,990 It's really designed to just get rid of the infrastructure problem, so developers 319 00:12:39,990 --> 00:12:40,520 can focus 320 00:12:40,520 --> 00:12:42,840 only on the agent's intelligence. 321 00:12:42,840 --> 00:12:45,690 If we connect this to the bigger picture, CUA is the crucial piece of 322 00:12:45,690 --> 00:12:46,400 infrastructure 323 00:12:46,400 --> 00:12:51,120 that just handles all the messiness of interacting with an operating system. 324 00:12:51,120 --> 00:12:52,880 It's the hammer, as one source put it. 325 00:12:52,880 --> 00:12:55,920 Yeah, they said, when you hold a hammer, everything looks like a nail. 326 00:12:55,920 --> 00:12:58,120 The CUA team is giving you the damn hammer. 327 00:12:58,120 --> 00:12:59,360 Go nail it. 328 00:12:59,360 --> 00:13:03,480 This crane work really turns the abstract idea of an intelligent agent into a real 329 00:13:03,480 --> 00:13:04,080 functional 330 00:13:04,080 --> 00:13:06,960 system that can safely control your digital world. 331 00:13:06,960 --> 00:13:10,800 It lets you stop worrying about the infrastructure and start designing the outcome. 332 00:13:10,800 --> 00:13:12,560 So what does this all mean? 333 00:13:12,560 --> 00:13:17,100 The sources are pretty clear that the era of secure, reliable computer-use AI 334 00:13:17,100 --> 00:13:17,520 agents 335 00:13:17,520 --> 00:13:18,520 is. 336 00:13:18,520 --> 00:13:19,520 It's not coming. 337 00:13:19,520 --> 00:13:20,520 It's here now. 338 00:13:20,520 --> 00:13:24,010 It's changing the whole desktop experience into something that's automated and 339 00:13:24,010 --> 00:13:24,700 adaptable. 340 00:13:24,700 --> 00:13:28,180 We're moving from telling our computers what to do to basically collaborating with 341 00:13:28,180 --> 00:13:28,720 an autonomous 342 00:13:28,720 --> 00:13:29,720 partner. 343 00:13:29,720 --> 00:13:32,680 And here's a final provocative thought for you to chew on. 344 00:13:32,680 --> 00:13:36,550 AI researchers are already trying to predict the timeline for when AI will reach 345 00:13:36,550 --> 00:13:37,200 human-level 346 00:13:37,200 --> 00:13:40,400 skills on that all-in-a-mess world benchmark we mentioned. 347 00:13:40,400 --> 00:13:44,660 So think about your most common, your most repetitive computer tasks pulling data, 348 00:13:44,660 --> 00:13:45,080 making 349 00:13:45,080 --> 00:13:47,040 reports, adjusting designs. 350 00:13:47,040 --> 00:13:50,830 How soon will an agent have a 9 in 10 chance of doing that task better, more 351 00:13:50,830 --> 00:13:51,500 reliably, 352 00:13:51,500 --> 00:13:53,400 and way faster than you can? 353 00:13:53,400 --> 00:13:55,520 Our deep dive today was brought to you by SafeServer. 354 00:13:55,520 --> 00:13:58,680 If you are looking for reliable hosting and support for your digital transformation, 355 00:13:58,680 --> 00:13:58,880 visit 356 00:13:58,880 --> 00:14:02,160 www.safeserver.de. 357 00:14:02,160 --> 00:14:03,160 Thank you for diving deep with us. 358 00:14:03,160 --> 00:14:03,160 We'll see you next time.