1 00:00:00,000 --> 00:00:02,200 Welcome back to the deep dive. 2 00:00:02,200 --> 00:00:04,740 If you've been watching fields like computer vision 3 00:00:04,740 --> 00:00:08,080 or multimodal machine learning grow, 4 00:00:08,080 --> 00:00:10,360 you quickly realize something. 5 00:00:10,360 --> 00:00:13,240 It's often not the algorithms holding things back. 6 00:00:13,240 --> 00:00:16,200 It's the data, like managing it, cleaning it up. 7 00:00:16,200 --> 00:00:18,760 Yeah, and figuring out what's actually worth labeling. 8 00:00:18,760 --> 00:00:19,800 That's a huge one. 9 00:00:19,800 --> 00:00:20,520 Exactly. 10 00:00:20,520 --> 00:00:24,040 So today, we're doing a deep dive into a platform built 11 00:00:24,040 --> 00:00:26,280 specifically for that problem, Lightly Studio. 12 00:00:26,280 --> 00:00:28,000 We've gathered a bunch of sources 13 00:00:28,000 --> 00:00:30,920 to really unpack this open source tool. 14 00:00:30,920 --> 00:00:32,440 Our goal here is pretty simple. 15 00:00:32,440 --> 00:00:35,160 Break down what Lightly Studio is, who it's for, 16 00:00:35,160 --> 00:00:37,520 and walk through the key ideas that, well, 17 00:00:37,520 --> 00:00:40,680 aim to turn messy data wrangling into something more automated. 18 00:00:40,680 --> 00:00:42,640 Try to make it approachable, even if you're just 19 00:00:42,640 --> 00:00:44,200 getting started in this space. 20 00:00:44,200 --> 00:00:47,160 But before we jump into the data pipelines themselves, 21 00:00:47,160 --> 00:00:49,440 we really want to thank the supporter of this deep dive, 22 00:00:49,440 --> 00:00:50,800 SafeServer. 23 00:00:50,800 --> 00:00:53,120 SafeServer handles hosting for exactly this kind 24 00:00:53,120 --> 00:00:55,960 of specialized software, and they support 25 00:00:55,960 --> 00:00:57,560 your digital transformation. 26 00:00:57,560 --> 00:00:59,760 So when you need to deploy tools like Lightly Studio, 27 00:00:59,760 --> 00:01:02,080 they provide that crucial infrastructure. 28 00:01:02,080 --> 00:01:04,520 You can find out more and start your own journey 29 00:01:04,520 --> 00:01:06,840 at www.safeserver.de. 30 00:01:06,840 --> 00:01:09,400 And that infrastructure piece is really key, 31 00:01:09,400 --> 00:01:12,360 because scaling these modern ML projects, 32 00:01:12,360 --> 00:01:15,080 it often hits a wall when the data just gets out of control. 33 00:01:15,080 --> 00:01:16,960 You've got to curate it, label it right, 34 00:01:16,960 --> 00:01:19,120 keep track of every version, every change. 35 00:01:19,120 --> 00:01:20,280 It's a lot. 36 00:01:20,280 --> 00:01:22,080 Lightly Studio really pitches itself 37 00:01:22,080 --> 00:01:26,040 as that unified data platform for multimodal ML. 38 00:01:26,040 --> 00:01:29,040 The idea is consolidating all those tricky, separate steps 39 00:01:29,040 --> 00:01:31,640 into one place, make it manageable. 40 00:01:31,640 --> 00:01:33,120 OK, so let's start right there. 41 00:01:33,120 --> 00:01:35,040 What is this tool, fundamentally? 42 00:01:35,040 --> 00:01:37,000 We know it's from Lightly, it's open source. 43 00:01:37,000 --> 00:01:39,840 But what are those main tasks it's trying to unify? 44 00:01:39,840 --> 00:01:43,680 Right, it's about unifying that whole workflow, curation, 45 00:01:43,680 --> 00:01:46,280 annotation, and management. 46 00:01:46,280 --> 00:01:47,840 It was really built by ML engineers 47 00:01:47,840 --> 00:01:49,520 for ML engineers and organizations 48 00:01:49,520 --> 00:01:51,880 that are trying to scale up computer vision work. 49 00:01:51,880 --> 00:01:54,520 They recognize that, well, speed and flexibility are paramount. 50 00:01:54,520 --> 00:01:56,440 At speed, you mentioned, how does it achieve that? 51 00:01:56,440 --> 00:01:59,520 Well, the sources point to a pretty key architectural choice. 52 00:01:59,520 --> 00:02:02,120 It's built using Rust. 53 00:02:02,120 --> 00:02:03,240 Rust, OK. 54 00:02:03,240 --> 00:02:06,040 Yeah, and Rust is known for performance, right? 55 00:02:06,040 --> 00:02:07,080 And memory safety. 56 00:02:07,080 --> 00:02:10,120 So what that means for someone using it is efficiency. 57 00:02:10,120 --> 00:02:13,560 You can actually handle massive data sets like Kokyo or ImageNet 58 00:02:13,560 --> 00:02:17,040 Scale and do some pretty heavy processing, 59 00:02:17,040 --> 00:02:19,560 even on, say, standard hardware. 60 00:02:19,560 --> 00:02:21,840 Not necessarily a giant server farm. 61 00:02:21,840 --> 00:02:23,240 Like what kind of standard hardware? 62 00:02:23,240 --> 00:02:28,520 I think like a decent laptop, an M1 MacBook Pro with maybe 16 gigs 63 00:02:28,520 --> 00:02:29,720 of RAM, that kind of thing. 64 00:02:29,720 --> 00:02:30,840 OK, that's impressive. 65 00:02:30,840 --> 00:02:32,600 So it's got this powerful engine. 66 00:02:32,600 --> 00:02:34,440 What are the actual functions? 67 00:02:34,440 --> 00:02:36,480 What does the platform do day to day? 68 00:02:36,480 --> 00:02:38,560 Yeah, there are really four core functions 69 00:02:38,560 --> 00:02:41,160 that kind of cover the whole data lifecycle representing 70 00:02:41,160 --> 00:02:42,240 its main value. 71 00:02:42,240 --> 00:02:44,000 First up, you've got label and QA, 72 00:02:44,000 --> 00:02:46,800 so built-in tools for annotating images and videos. 73 00:02:46,800 --> 00:02:48,040 Pretty essential. 74 00:02:48,040 --> 00:02:51,000 Second, it helps you understand and visualize your data. 75 00:02:51,000 --> 00:02:53,720 So you can filter it, automatically find exact duplicates, 76 00:02:53,720 --> 00:02:56,080 which honestly saves so much time, 77 00:02:56,080 --> 00:02:57,980 spot those really important edge cases, 78 00:02:57,980 --> 00:03:00,080 and also catch data drift 79 00:03:00,080 --> 00:03:02,160 as your real-world conditions change. 80 00:03:02,160 --> 00:03:03,400 Okay, makes sense. 81 00:03:03,400 --> 00:03:06,560 Third, it lets you intelligently curate data. 82 00:03:06,560 --> 00:03:08,880 This means automatically selecting the samples 83 00:03:08,880 --> 00:03:11,960 that are actually the most valuable for training 84 00:03:11,960 --> 00:03:13,360 or fine-tuning your model. 85 00:03:13,360 --> 00:03:14,720 We'll probably dig into that more later. 86 00:03:14,720 --> 00:03:16,400 Yeah, definitely want to circle back to that. 87 00:03:16,400 --> 00:03:17,440 And the last one. 88 00:03:17,440 --> 00:03:19,080 And finally, after you've done all that work, 89 00:03:19,080 --> 00:03:22,480 you need to export and deploy that curated data set. 90 00:03:22,480 --> 00:03:24,680 And it lets you do that whether you're running Lightly Studio 91 00:03:24,680 --> 00:03:28,600 on your own machines, on-prem, using a hybrid cloud setup, 92 00:03:28,600 --> 00:03:30,160 or fully in the cloud. 93 00:03:30,160 --> 00:03:31,680 Gotcha, flexible deployment. 94 00:03:31,680 --> 00:03:35,000 Exactly, and you know, we keep saying multimodal ML. 95 00:03:35,000 --> 00:03:36,800 We should probably clarify that a bit. 96 00:03:36,800 --> 00:03:37,620 Good point. 97 00:03:37,620 --> 00:03:40,720 It means the platform isn't just for like standard photos. 98 00:03:40,720 --> 00:03:43,320 It really supports a wide range of data types. 99 00:03:43,320 --> 00:03:48,000 Images, sure, but also video clips, audio files, text, 100 00:03:48,000 --> 00:03:51,700 and importantly, even specialized formats like DICOM data. 101 00:03:51,700 --> 00:03:53,320 Oh, the medical imaging format. 102 00:03:53,320 --> 00:03:56,480 That's the one, yeah, for X-rays, MRIs, that kind of thing. 103 00:03:56,480 --> 00:04:00,920 So having that breadth really makes it a genuinely unified 104 00:04:00,920 --> 00:04:03,120 hub for different data types. 105 00:04:03,120 --> 00:04:05,680 OK, so it handles a lot of data types. 106 00:04:05,680 --> 00:04:08,200 Let's unpack who actually uses this. 107 00:04:08,200 --> 00:04:10,300 Who benefits from this unification? 108 00:04:10,300 --> 00:04:10,800 Yeah. 109 00:04:10,800 --> 00:04:12,960 Because the source has mentioned like two main groups that 110 00:04:12,960 --> 00:04:14,300 don't always talk to each other. 111 00:04:14,300 --> 00:04:15,720 Yeah, that's a good way to put it. 112 00:04:15,720 --> 00:04:16,880 It tries to bridge that gap. 113 00:04:16,880 --> 00:04:21,240 So on one side, you've got the ML engineers, data scientists, 114 00:04:21,240 --> 00:04:22,480 the infrastructure folks. 115 00:04:22,480 --> 00:04:23,400 Yeah. 116 00:04:23,400 --> 00:04:25,280 What's in it for them specifically? 117 00:04:25,280 --> 00:04:25,680 Right. 118 00:04:25,680 --> 00:04:27,720 For the engineers, it's really all 119 00:04:27,720 --> 00:04:29,920 about integration and automation. 120 00:04:29,920 --> 00:04:33,740 It's built with SDKs and API support, all based 121 00:04:33,740 --> 00:04:35,280 on open source standards. 122 00:04:35,280 --> 00:04:38,240 So the idea is it slots into their existing ML stacks 123 00:04:38,240 --> 00:04:42,000 pretty easily without needing a total rewrite of everything. 124 00:04:42,000 --> 00:04:42,920 Less disruption. 125 00:04:42,920 --> 00:04:43,720 Exactly. 126 00:04:43,720 --> 00:04:44,940 And automation is key, right? 127 00:04:44,940 --> 00:04:47,520 That's handled mainly through the Python SDK. 128 00:04:47,520 --> 00:04:51,640 So they can script everything, importing data, managing it. 129 00:04:51,640 --> 00:04:53,760 They can pull data straight from where they usually 130 00:04:53,760 --> 00:04:58,360 keep it, like local folders or cloud storage, like S3 or GCS. 131 00:04:58,360 --> 00:05:00,600 There's big ones from Amazon and Google Cloud, right? 132 00:05:00,600 --> 00:05:02,600 Yeah, the standard object storage. 133 00:05:02,600 --> 00:05:04,360 And crucially, once data is in, it's 134 00:05:04,360 --> 00:05:05,840 not like it's locked forever. 135 00:05:05,840 --> 00:05:08,960 You can keep adding new data to existing data 136 00:05:08,960 --> 00:05:11,240 sets as you get it, which is, well, 137 00:05:11,240 --> 00:05:14,160 vital for any kind of research or iterative development. 138 00:05:14,160 --> 00:05:14,760 Absolutely. 139 00:05:14,760 --> 00:05:16,040 OK, so that's the engineers. 140 00:05:16,040 --> 00:05:18,680 Then on the other side, you mentioned labelers and project 141 00:05:18,680 --> 00:05:19,640 managers. 142 00:05:19,640 --> 00:05:21,360 These might be less technical users. 143 00:05:21,360 --> 00:05:22,200 Often, yeah. 144 00:05:22,200 --> 00:05:24,640 They're focused on the quality assurance, managing 145 00:05:24,640 --> 00:05:27,060 large teams, logistics. 146 00:05:27,060 --> 00:05:30,240 For them, the platform emphasizes 147 00:05:30,240 --> 00:05:32,120 more intuitive workflows. 148 00:05:32,120 --> 00:05:36,040 So a user-friendly GUI, tools for collaboration, 149 00:05:36,040 --> 00:05:38,280 and really critical features, like data set versioning. 150 00:05:38,280 --> 00:05:39,720 Oh, versioning is huge. 151 00:05:39,720 --> 00:05:42,480 Anyone who's tried to reproduce an old result knows that pain. 152 00:05:42,480 --> 00:05:43,080 Totally. 153 00:05:43,080 --> 00:05:45,360 Knowing exactly which version of the labels 154 00:05:45,360 --> 00:05:48,040 you used six months ago, yeah, it's crucial. 155 00:05:48,040 --> 00:05:50,100 Plus, things like role-based permissions 156 00:05:50,100 --> 00:05:52,800 to manage who can do what in the annotation team 157 00:05:52,800 --> 00:05:54,240 make sense for project managers. 158 00:05:54,240 --> 00:05:55,920 So it's bridging these two worlds. 159 00:05:55,920 --> 00:05:58,080 And interestingly, it seems very focused 160 00:05:58,080 --> 00:06:00,680 on making it easy to switch to Lightly Studio. 161 00:06:00,680 --> 00:06:03,320 The sources actually call out that it simplifies migrating 162 00:06:03,320 --> 00:06:04,480 data from other tools. 163 00:06:04,480 --> 00:06:05,800 Oh, like competitors. 164 00:06:05,800 --> 00:06:09,280 Yeah, they mention names like Encore, Voxel 51, Ultralytics, 165 00:06:09,280 --> 00:06:13,500 V7 Labs, Roboflow, popular tools in the space. 166 00:06:13,500 --> 00:06:16,960 It seems like they actively want to be that central data hub, 167 00:06:16,960 --> 00:06:19,520 reducing the friction if the team decides, OK, 168 00:06:19,520 --> 00:06:21,480 we need to consolidate onto one platform. 169 00:06:21,480 --> 00:06:22,880 OK, that makes strategic sense. 170 00:06:22,880 --> 00:06:25,160 Now, here's where I think it gets really interesting, 171 00:06:25,160 --> 00:06:27,640 especially for folks wanting to automate things. 172 00:06:27,640 --> 00:06:29,480 The Python interface. 173 00:06:29,480 --> 00:06:31,240 We don't need to become Python experts here. 174 00:06:31,240 --> 00:06:33,760 But understanding the basic concepts 175 00:06:33,760 --> 00:06:37,440 seems key to unlocking that automation power. 176 00:06:37,440 --> 00:06:39,520 What are the main building blocks for a beginner? 177 00:06:39,520 --> 00:06:40,360 Yeah, definitely. 178 00:06:40,360 --> 00:06:42,000 You can think of it like setting up 179 00:06:42,000 --> 00:06:45,880 a big physical filing system makes it easier to grasp. 180 00:06:45,880 --> 00:06:47,780 So the first main concept is the data set. 181 00:06:47,780 --> 00:06:49,360 If you're using the Python interface, 182 00:06:49,360 --> 00:06:51,120 this is sort of your top level thing. 183 00:06:51,120 --> 00:06:52,960 Think of the data set like your main binder 184 00:06:52,960 --> 00:06:54,400 or the whole filing cabinet. 185 00:06:54,400 --> 00:06:56,780 You use it to set up your data, connect to the database file 186 00:06:56,780 --> 00:06:59,440 where everything's stored, kick off the visual interface 187 00:06:59,440 --> 00:07:00,000 if you need it. 188 00:07:00,000 --> 00:07:02,160 And critically, it's what you use 189 00:07:02,160 --> 00:07:03,880 to run your queries and selections 190 00:07:03,880 --> 00:07:05,160 to find specific data. 191 00:07:05,160 --> 00:07:07,320 OK, so data set is the container. 192 00:07:07,320 --> 00:07:08,600 Then what's inside? 193 00:07:08,600 --> 00:07:10,320 Inside, you have the sample. 194 00:07:10,320 --> 00:07:11,880 So if the data set is the binder, 195 00:07:11,880 --> 00:07:15,560 the sample is like a single page or a single file inside it. 196 00:07:15,560 --> 00:07:17,160 It's just one data instance. 197 00:07:17,160 --> 00:07:20,640 Could be one image, one audio clip, whatever. 198 00:07:20,640 --> 00:07:22,440 And what does it know about itself? 199 00:07:22,440 --> 00:07:25,120 It holds all the key info, the unique ID, 200 00:07:25,120 --> 00:07:27,020 like a serial number, the file name 201 00:07:27,020 --> 00:07:29,440 where it lives on the disk, the absolute path, 202 00:07:29,440 --> 00:07:32,260 and importantly, a list of descriptive tags. 203 00:07:32,260 --> 00:07:34,680 Tags could be anything like reviewed, 204 00:07:34,680 --> 00:07:38,560 needs labeling, nighttime, simple labels you attach. 205 00:07:38,560 --> 00:07:40,560 And it also gives you access to the metadata. 206 00:07:40,560 --> 00:07:43,440 That's all the other descriptive stuff about the file image 207 00:07:43,440 --> 00:07:45,240 resolution when it was captured. 208 00:07:45,240 --> 00:07:48,240 Maybe GPS coordinates, depends on the data. 209 00:07:48,240 --> 00:07:50,840 Dataset holds samples, samples hold info and tags. 210 00:07:50,840 --> 00:07:51,840 What's the third piece? 211 00:07:51,840 --> 00:07:55,220 The third piece is the real power move, data set queries. 212 00:07:55,220 --> 00:07:58,440 This is how you find very specific subsets of your data 213 00:07:58,440 --> 00:08:00,760 without looking through potentially millions of files 214 00:08:00,760 --> 00:08:01,640 manually. 215 00:08:01,640 --> 00:08:05,320 Queries let you combine filtering, sorting, slicing, 216 00:08:05,320 --> 00:08:08,220 using standard logic, those Boolean expressions, 217 00:08:08,220 --> 00:08:09,960 and D, or not Dame. 218 00:08:09,960 --> 00:08:12,420 So is this just like filtering columns in a spreadsheet, 219 00:08:12,420 --> 00:08:15,080 or is it more powerful with this kind of data? 220 00:08:15,080 --> 00:08:16,860 It's way more powerful because you 221 00:08:16,860 --> 00:08:21,640 can query based on the tags and the metadata at the same time. 222 00:08:21,640 --> 00:08:23,760 So for instance, you could build a query like, 223 00:08:23,760 --> 00:08:27,440 find all samples that are tagged, needs labeling, or are. 224 00:08:27,440 --> 00:08:30,920 Find samples where the image width is less than 500 pixels 225 00:08:30,920 --> 00:08:34,720 in D, they have not been tagged as reviewed yet. 226 00:08:34,720 --> 00:08:36,800 OK, so you can get really specific to find 227 00:08:36,800 --> 00:08:38,040 potential problems or gaps. 228 00:08:38,040 --> 00:08:38,760 Exactly. 229 00:08:38,760 --> 00:08:40,320 It finds that precise set of data 230 00:08:40,320 --> 00:08:42,120 that maybe slipped through your initial checks 231 00:08:42,120 --> 00:08:43,360 or needs special attention. 232 00:08:43,360 --> 00:08:45,080 That feels like the big win right there, 233 00:08:45,080 --> 00:08:48,440 turning data prep from this manual slog 234 00:08:48,440 --> 00:08:49,840 into something you can script. 235 00:08:49,840 --> 00:08:50,720 Precisely. 236 00:08:50,720 --> 00:08:53,320 And once you run that query, the really useful part 237 00:08:53,320 --> 00:08:55,920 is you can then immediately do something with that subset, 238 00:08:55,920 --> 00:08:59,320 like apply a new tag to all of them, say, needs review. 239 00:08:59,320 --> 00:09:01,000 Then later in the visual interface, 240 00:09:01,000 --> 00:09:03,040 they're super easy to find and work on just 241 00:09:03,040 --> 00:09:04,720 by filtering for that tag. 242 00:09:04,720 --> 00:09:05,280 Nice. 243 00:09:05,280 --> 00:09:06,840 And for beginners, getting data in 244 00:09:06,840 --> 00:09:08,560 seems pretty straightforward, too. 245 00:09:08,560 --> 00:09:10,640 The sources mention easy ways to load data 246 00:09:10,640 --> 00:09:12,480 from common formats, like YOLO. 247 00:09:12,480 --> 00:09:14,760 That's for object detection, right, bounding boxes. 248 00:09:14,760 --> 00:09:15,400 Yeah. 249 00:09:15,400 --> 00:09:19,400 And Cocoa, which is often used for instant segmentation 250 00:09:19,400 --> 00:09:22,600 or image captions, you just use simple Python functions, 251 00:09:22,600 --> 00:09:26,000 like add samples from Milo or add samples from Cocoa. 252 00:09:26,000 --> 00:09:28,680 Keeping the barrier to entry low for common formats. 253 00:09:28,680 --> 00:09:29,300 Good. 254 00:09:29,300 --> 00:09:30,920 Now, this feels like a good transition 255 00:09:30,920 --> 00:09:32,160 to that feature you mentioned earlier, 256 00:09:32,160 --> 00:09:34,920 the one that really shows off the platform's advanced side. 257 00:09:34,920 --> 00:09:35,640 Selection. 258 00:09:35,640 --> 00:09:37,720 You said this is where it saves real money and time. 259 00:09:37,720 --> 00:09:38,220 Yeah. 260 00:09:38,220 --> 00:09:41,440 This is arguably the core IP, the really smart bit. 261 00:09:41,440 --> 00:09:44,640 Selection is basically automated data selection. 262 00:09:44,640 --> 00:09:47,360 And the purpose is simple, but huge. 263 00:09:47,360 --> 00:09:49,800 Save potentially massive labeling costs 264 00:09:49,800 --> 00:09:52,520 and cut down training time while actually improving 265 00:09:52,520 --> 00:09:54,600 your final model quality. 266 00:09:54,600 --> 00:09:55,640 How does that work? 267 00:09:55,640 --> 00:09:57,200 If I have, say, a million images, 268 00:09:57,200 --> 00:09:59,580 but only budget to label a few hundred, 269 00:09:59,580 --> 00:10:00,960 how does it pick the best hundred? 270 00:10:00,960 --> 00:10:01,720 That sounds tricky. 271 00:10:01,720 --> 00:10:02,440 It is tricky. 272 00:10:02,440 --> 00:10:03,800 That's why automation helps. 273 00:10:03,800 --> 00:10:05,880 It avoids human bias and, frankly, 274 00:10:05,880 --> 00:10:08,840 the tediousness of trying to ensure variety manually. 275 00:10:08,840 --> 00:10:10,800 The mechanism works by automatically picking 276 00:10:10,800 --> 00:10:13,040 the samples considered most useful. 277 00:10:13,040 --> 00:10:16,080 And it does this by balancing two key factors that models 278 00:10:16,080 --> 00:10:17,080 need to be robust. 279 00:10:17,080 --> 00:10:18,480 OK, what are the two factors? 280 00:10:18,480 --> 00:10:21,280 First, you need representative samples. 281 00:10:21,280 --> 00:10:23,240 This is your core data, the typical stuff 282 00:10:23,240 --> 00:10:26,360 your model will see 95% of the time, the normal cases. 283 00:10:26,360 --> 00:10:27,560 Right, the bread and butter. 284 00:10:27,560 --> 00:10:29,920 But if you only train on that, your model 285 00:10:29,920 --> 00:10:33,160 falls apart the moment something slightly unusual happens. 286 00:10:33,160 --> 00:10:35,720 So second, you need diverse samples. 287 00:10:35,720 --> 00:10:37,720 These are the crucial edge cases, 288 00:10:37,720 --> 00:10:40,600 the novel or rare examples, the stuff the model hasn't really 289 00:10:40,600 --> 00:10:42,200 seen before but needs to handle. 290 00:10:42,200 --> 00:10:44,960 OK, so it's like, if I'm training a self-driving car 291 00:10:44,960 --> 00:10:47,920 model, I need lots of pictures of normal daytime driving. 292 00:10:47,920 --> 00:10:49,480 That's your representative data. 293 00:10:49,480 --> 00:10:53,320 But I also absolutely need examples 294 00:10:53,320 --> 00:10:56,820 of driving in heavy rain at night, maybe with a weird object 295 00:10:56,820 --> 00:10:57,360 on the road. 296 00:10:57,360 --> 00:10:59,320 Those are your diverse edge case samples. 297 00:10:59,320 --> 00:10:59,880 Exactly. 298 00:10:59,880 --> 00:11:01,640 Selection aims to pick a subset that 299 00:11:01,640 --> 00:11:03,440 intelligently balances both. 300 00:11:03,440 --> 00:11:05,040 So if I just labeled 100 pictures 301 00:11:05,040 --> 00:11:07,960 of the same boring highway on a sunny day, 302 00:11:07,960 --> 00:11:09,800 I've kind of wasted 99 labels. 303 00:11:09,800 --> 00:11:11,080 Selection stops that. 304 00:11:11,080 --> 00:11:11,880 That's the idea. 305 00:11:11,880 --> 00:11:14,760 It forces variety into your labeled set. 306 00:11:14,760 --> 00:11:18,420 And you, the user, get to control this balance. 307 00:11:18,420 --> 00:11:19,920 You can use different strategies. 308 00:11:19,920 --> 00:11:23,300 For example, a metadata weighting strategy. 309 00:11:23,300 --> 00:11:26,160 Maybe you tell it to prioritize samples tag, nighttime, 310 00:11:26,160 --> 00:11:29,040 or rainy, because you know those are hard cases for your model. 311 00:11:29,040 --> 00:11:30,720 OK, using the tags we talked about earlier. 312 00:11:30,720 --> 00:11:31,400 Right. 313 00:11:31,400 --> 00:11:33,880 Or you could use something like an embedding diversity 314 00:11:33,880 --> 00:11:34,960 strategy. 315 00:11:34,960 --> 00:11:36,760 This is more AI driven. 316 00:11:36,760 --> 00:11:38,760 It looks at the actual visual content 317 00:11:38,760 --> 00:11:42,260 using embeddings, numerical representations of the images, 318 00:11:42,260 --> 00:11:45,800 and picks samples that are mathematically distant 319 00:11:45,800 --> 00:11:48,240 or different from what's already been selected, 320 00:11:48,240 --> 00:11:51,120 even if a human didn't explicitly tag it as diverse. 321 00:11:51,120 --> 00:11:53,300 Wow, OK, that sounds powerful. 322 00:11:53,300 --> 00:11:54,960 It leads to some pretty impressive results, 323 00:11:54,960 --> 00:11:56,040 according to the sources. 324 00:11:56,040 --> 00:11:59,480 They cite things like up to an 80% cut in annotation costs. 325 00:11:59,480 --> 00:12:00,240 80%. 326 00:12:00,240 --> 00:12:00,800 Yeah. 327 00:12:00,800 --> 00:12:03,200 And model iteration cycles getting three times faster. 328 00:12:03,200 --> 00:12:05,680 Plus, actual accuracy bumps, sometimes 10%. 329 00:12:05,680 --> 00:12:08,160 Sometimes even sizes as high as 36%. 330 00:12:08,160 --> 00:12:09,560 That's significant. 331 00:12:09,560 --> 00:12:11,960 It really underlines that focusing on data quality 332 00:12:11,960 --> 00:12:14,800 over just raw quantity pays off. 333 00:12:14,800 --> 00:12:15,560 Absolutely. 334 00:12:15,560 --> 00:12:18,120 The return on investment for curating your data smartly 335 00:12:18,120 --> 00:12:19,320 is undeniable. 336 00:12:19,320 --> 00:12:22,160 So just to wrap up the main idea for the listener, 337 00:12:22,160 --> 00:12:24,960 Lightly Studio positions itself as that central data hub. 338 00:12:24,960 --> 00:12:27,360 It brings together the management, the labeling, 339 00:12:27,360 --> 00:12:31,240 and this intelligent automated curation using selection. 340 00:12:31,240 --> 00:12:33,040 And it also links up with their other tools, 341 00:12:33,040 --> 00:12:35,080 like LightlyTrain for pre-training models 342 00:12:35,080 --> 00:12:37,360 and LightlyEdge for optimizing data collection out 343 00:12:37,360 --> 00:12:38,280 in the field. 344 00:12:38,280 --> 00:12:40,660 So you get the open source flexibility and cost benefits, 345 00:12:40,660 --> 00:12:43,560 but paired with enterprise level security rigor. 346 00:12:43,560 --> 00:12:46,520 They mentioned being ISO 27001 certified, 347 00:12:46,520 --> 00:12:48,200 which is important for businesses. 348 00:12:48,200 --> 00:12:51,160 All right, that security aspect is key for adoption. 349 00:12:51,160 --> 00:12:53,200 Okay, that level of control over the data 350 00:12:53,200 --> 00:12:54,720 brings us to our final thought for you, 351 00:12:54,720 --> 00:12:56,640 the listener, to maybe chew on. 352 00:12:56,640 --> 00:12:58,040 We tend to get really fixated on picking 353 00:12:58,040 --> 00:12:59,680 the perfect model architecture, 354 00:12:59,680 --> 00:13:01,720 tweaking the training parameters, 355 00:13:01,720 --> 00:13:03,880 but maybe the real leverage, the real power, 356 00:13:03,880 --> 00:13:05,920 actually lies in data curation. 357 00:13:05,920 --> 00:13:07,740 So here's the question. 358 00:13:07,740 --> 00:13:10,200 If you could only afford to label, say, 359 00:13:10,200 --> 00:13:13,960 100 images for your next big computer vision project, 360 00:13:13,960 --> 00:13:16,320 how confident are you right now 361 00:13:16,320 --> 00:13:19,320 that those specific 100 images would be the absolute best, 362 00:13:19,320 --> 00:13:22,080 most useful, most diverse set possible 363 00:13:22,080 --> 00:13:24,380 to train your model effectively? 364 00:13:24,380 --> 00:13:26,700 Tools like Lightly Studio are fundamentally designed 365 00:13:26,700 --> 00:13:28,160 to take the guesswork out of that question 366 00:13:28,160 --> 00:13:30,520 and give you some actual measurable certainty, 367 00:13:30,520 --> 00:13:32,000 something to think about. 368 00:13:32,000 --> 00:13:33,520 Well, thank you for joining us for this deep dive 369 00:13:33,520 --> 00:13:36,000 into data management for multimodal ML. 370 00:13:36,000 --> 00:13:37,460 And just one final reminder 371 00:13:37,460 --> 00:13:39,640 that this deep dive was supported by Safe Server. 372 00:13:39,640 --> 00:13:41,760 Safe Server supports your digital transformation 373 00:13:41,760 --> 00:13:43,800 and handles hosting for this kind of software. 374 00:13:43,800 --> 00:13:47,840 You can find out more at www.safeserver.de. 375 00:13:47,840 --> 00:13:49,880 We'll see you next time on the deep dive.