1
00:00:00,000 --> 00:00:02,200
Welcome back to the deep dive.

2
00:00:02,200 --> 00:00:04,740
If you've been watching fields like computer vision

3
00:00:04,740 --> 00:00:08,080
or multimodal machine learning grow,

4
00:00:08,080 --> 00:00:10,360
you quickly realize something.

5
00:00:10,360 --> 00:00:13,240
It's often not the algorithms holding things back.

6
00:00:13,240 --> 00:00:16,200
It's the data, like managing it, cleaning it up.

7
00:00:16,200 --> 00:00:18,760
Yeah, and figuring out what's actually worth labeling.

8
00:00:18,760 --> 00:00:19,800
That's a huge one.

9
00:00:19,800 --> 00:00:20,520
Exactly.

10
00:00:20,520 --> 00:00:24,040
So today, we're doing a deep dive into a platform built

11
00:00:24,040 --> 00:00:26,280
specifically for that problem, Lightly Studio.

12
00:00:26,280 --> 00:00:28,000
We've gathered a bunch of sources

13
00:00:28,000 --> 00:00:30,920
to really unpack this open source tool.

14
00:00:30,920 --> 00:00:32,440
Our goal here is pretty simple.

15
00:00:32,440 --> 00:00:35,160
Break down what Lightly Studio is, who it's for,

16
00:00:35,160 --> 00:00:37,520
and walk through the key ideas that, well,

17
00:00:37,520 --> 00:00:40,680
aim to turn messy data wrangling into something more automated.

18
00:00:40,680 --> 00:00:42,640
Try to make it approachable, even if you're just

19
00:00:42,640 --> 00:00:44,200
getting started in this space.

20
00:00:44,200 --> 00:00:47,160
But before we jump into the data pipelines themselves,

21
00:00:47,160 --> 00:00:49,440
we really want to thank the supporter of this deep dive,

22
00:00:49,440 --> 00:00:50,800
SafeServer.

23
00:00:50,800 --> 00:00:53,120
SafeServer handles hosting for exactly this kind

24
00:00:53,120 --> 00:00:55,960
of specialized software, and they support

25
00:00:55,960 --> 00:00:57,560
your digital transformation.

26
00:00:57,560 --> 00:00:59,760
So when you need to deploy tools like Lightly Studio,

27
00:00:59,760 --> 00:01:02,080
they provide that crucial infrastructure.

28
00:01:02,080 --> 00:01:04,520
You can find out more and start your own journey

29
00:01:04,520 --> 00:01:06,840
at www.safeserver.de.

30
00:01:06,840 --> 00:01:09,400
And that infrastructure piece is really key,

31
00:01:09,400 --> 00:01:12,360
because scaling these modern ML projects,

32
00:01:12,360 --> 00:01:15,080
it often hits a wall when the data just gets out of control.

33
00:01:15,080 --> 00:01:16,960
You've got to curate it, label it right,

34
00:01:16,960 --> 00:01:19,120
keep track of every version, every change.

35
00:01:19,120 --> 00:01:20,280
It's a lot.

36
00:01:20,280 --> 00:01:22,080
Lightly Studio really pitches itself

37
00:01:22,080 --> 00:01:26,040
as that unified data platform for multimodal ML.

38
00:01:26,040 --> 00:01:29,040
The idea is consolidating all those tricky, separate steps

39
00:01:29,040 --> 00:01:31,640
into one place, make it manageable.

40
00:01:31,640 --> 00:01:33,120
OK, so let's start right there.

41
00:01:33,120 --> 00:01:35,040
What is this tool, fundamentally?

42
00:01:35,040 --> 00:01:37,000
We know it's from Lightly, it's open source.

43
00:01:37,000 --> 00:01:39,840
But what are those main tasks it's trying to unify?

44
00:01:39,840 --> 00:01:43,680
Right, it's about unifying that whole workflow, curation,

45
00:01:43,680 --> 00:01:46,280
annotation, and management.

46
00:01:46,280 --> 00:01:47,840
It was really built by ML engineers

47
00:01:47,840 --> 00:01:49,520
for ML engineers and organizations

48
00:01:49,520 --> 00:01:51,880
that are trying to scale up computer vision work.

49
00:01:51,880 --> 00:01:54,520
They recognize that, well, speed and flexibility are paramount.

50
00:01:54,520 --> 00:01:56,440
At speed, you mentioned, how does it achieve that?

51
00:01:56,440 --> 00:01:59,520
Well, the sources point to a pretty key architectural choice.

52
00:01:59,520 --> 00:02:02,120
It's built using Rust.

53
00:02:02,120 --> 00:02:03,240
Rust, OK.

54
00:02:03,240 --> 00:02:06,040
Yeah, and Rust is known for performance, right?

55
00:02:06,040 --> 00:02:07,080
And memory safety.

56
00:02:07,080 --> 00:02:10,120
So what that means for someone using it is efficiency.

57
00:02:10,120 --> 00:02:13,560
You can actually handle massive data sets like Kokyo or ImageNet

58
00:02:13,560 --> 00:02:17,040
Scale and do some pretty heavy processing,

59
00:02:17,040 --> 00:02:19,560
even on, say, standard hardware.

60
00:02:19,560 --> 00:02:21,840
Not necessarily a giant server farm.

61
00:02:21,840 --> 00:02:23,240
Like what kind of standard hardware?

62
00:02:23,240 --> 00:02:28,520
I think like a decent laptop, an M1 MacBook Pro with maybe 16 gigs

63
00:02:28,520 --> 00:02:29,720
of RAM, that kind of thing.

64
00:02:29,720 --> 00:02:30,840
OK, that's impressive.

65
00:02:30,840 --> 00:02:32,600
So it's got this powerful engine.

66
00:02:32,600 --> 00:02:34,440
What are the actual functions?

67
00:02:34,440 --> 00:02:36,480
What does the platform do day to day?

68
00:02:36,480 --> 00:02:38,560
Yeah, there are really four core functions

69
00:02:38,560 --> 00:02:41,160
that kind of cover the whole data lifecycle representing

70
00:02:41,160 --> 00:02:42,240
its main value.

71
00:02:42,240 --> 00:02:44,000
First up, you've got label and QA,

72
00:02:44,000 --> 00:02:46,800
so built-in tools for annotating images and videos.

73
00:02:46,800 --> 00:02:48,040
Pretty essential.

74
00:02:48,040 --> 00:02:51,000
Second, it helps you understand and visualize your data.

75
00:02:51,000 --> 00:02:53,720
So you can filter it, automatically find exact duplicates,

76
00:02:53,720 --> 00:02:56,080
which honestly saves so much time,

77
00:02:56,080 --> 00:02:57,980
spot those really important edge cases,

78
00:02:57,980 --> 00:03:00,080
and also catch data drift

79
00:03:00,080 --> 00:03:02,160
as your real-world conditions change.

80
00:03:02,160 --> 00:03:03,400
Okay, makes sense.

81
00:03:03,400 --> 00:03:06,560
Third, it lets you intelligently curate data.

82
00:03:06,560 --> 00:03:08,880
This means automatically selecting the samples

83
00:03:08,880 --> 00:03:11,960
that are actually the most valuable for training

84
00:03:11,960 --> 00:03:13,360
or fine-tuning your model.

85
00:03:13,360 --> 00:03:14,720
We'll probably dig into that more later.

86
00:03:14,720 --> 00:03:16,400
Yeah, definitely want to circle back to that.

87
00:03:16,400 --> 00:03:17,440
And the last one.

88
00:03:17,440 --> 00:03:19,080
And finally, after you've done all that work,

89
00:03:19,080 --> 00:03:22,480
you need to export and deploy that curated data set.

90
00:03:22,480 --> 00:03:24,680
And it lets you do that whether you're running Lightly Studio

91
00:03:24,680 --> 00:03:28,600
on your own machines, on-prem, using a hybrid cloud setup,

92
00:03:28,600 --> 00:03:30,160
or fully in the cloud.

93
00:03:30,160 --> 00:03:31,680
Gotcha, flexible deployment.

94
00:03:31,680 --> 00:03:35,000
Exactly, and you know, we keep saying multimodal ML.

95
00:03:35,000 --> 00:03:36,800
We should probably clarify that a bit.

96
00:03:36,800 --> 00:03:37,620
Good point.

97
00:03:37,620 --> 00:03:40,720
It means the platform isn't just for like standard photos.

98
00:03:40,720 --> 00:03:43,320
It really supports a wide range of data types.

99
00:03:43,320 --> 00:03:48,000
Images, sure, but also video clips, audio files, text,

100
00:03:48,000 --> 00:03:51,700
and importantly, even specialized formats like DICOM data.

101
00:03:51,700 --> 00:03:53,320
Oh, the medical imaging format.

102
00:03:53,320 --> 00:03:56,480
That's the one, yeah, for X-rays, MRIs, that kind of thing.

103
00:03:56,480 --> 00:04:00,920
So having that breadth really makes it a genuinely unified

104
00:04:00,920 --> 00:04:03,120
hub for different data types.

105
00:04:03,120 --> 00:04:05,680
OK, so it handles a lot of data types.

106
00:04:05,680 --> 00:04:08,200
Let's unpack who actually uses this.

107
00:04:08,200 --> 00:04:10,300
Who benefits from this unification?

108
00:04:10,300 --> 00:04:10,800
Yeah.

109
00:04:10,800 --> 00:04:12,960
Because the source has mentioned like two main groups that

110
00:04:12,960 --> 00:04:14,300
don't always talk to each other.

111
00:04:14,300 --> 00:04:15,720
Yeah, that's a good way to put it.

112
00:04:15,720 --> 00:04:16,880
It tries to bridge that gap.

113
00:04:16,880 --> 00:04:21,240
So on one side, you've got the ML engineers, data scientists,

114
00:04:21,240 --> 00:04:22,480
the infrastructure folks.

115
00:04:22,480 --> 00:04:23,400
Yeah.

116
00:04:23,400 --> 00:04:25,280
What's in it for them specifically?

117
00:04:25,280 --> 00:04:25,680
Right.

118
00:04:25,680 --> 00:04:27,720
For the engineers, it's really all

119
00:04:27,720 --> 00:04:29,920
about integration and automation.

120
00:04:29,920 --> 00:04:33,740
It's built with SDKs and API support, all based

121
00:04:33,740 --> 00:04:35,280
on open source standards.

122
00:04:35,280 --> 00:04:38,240
So the idea is it slots into their existing ML stacks

123
00:04:38,240 --> 00:04:42,000
pretty easily without needing a total rewrite of everything.

124
00:04:42,000 --> 00:04:42,920
Less disruption.

125
00:04:42,920 --> 00:04:43,720
Exactly.

126
00:04:43,720 --> 00:04:44,940
And automation is key, right?

127
00:04:44,940 --> 00:04:47,520
That's handled mainly through the Python SDK.

128
00:04:47,520 --> 00:04:51,640
So they can script everything, importing data, managing it.

129
00:04:51,640 --> 00:04:53,760
They can pull data straight from where they usually

130
00:04:53,760 --> 00:04:58,360
keep it, like local folders or cloud storage, like S3 or GCS.

131
00:04:58,360 --> 00:05:00,600
There's big ones from Amazon and Google Cloud, right?

132
00:05:00,600 --> 00:05:02,600
Yeah, the standard object storage.

133
00:05:02,600 --> 00:05:04,360
And crucially, once data is in, it's

134
00:05:04,360 --> 00:05:05,840
not like it's locked forever.

135
00:05:05,840 --> 00:05:08,960
You can keep adding new data to existing data

136
00:05:08,960 --> 00:05:11,240
sets as you get it, which is, well,

137
00:05:11,240 --> 00:05:14,160
vital for any kind of research or iterative development.

138
00:05:14,160 --> 00:05:14,760
Absolutely.

139
00:05:14,760 --> 00:05:16,040
OK, so that's the engineers.

140
00:05:16,040 --> 00:05:18,680
Then on the other side, you mentioned labelers and project

141
00:05:18,680 --> 00:05:19,640
managers.

142
00:05:19,640 --> 00:05:21,360
These might be less technical users.

143
00:05:21,360 --> 00:05:22,200
Often, yeah.

144
00:05:22,200 --> 00:05:24,640
They're focused on the quality assurance, managing

145
00:05:24,640 --> 00:05:27,060
large teams, logistics.

146
00:05:27,060 --> 00:05:30,240
For them, the platform emphasizes

147
00:05:30,240 --> 00:05:32,120
more intuitive workflows.

148
00:05:32,120 --> 00:05:36,040
So a user-friendly GUI, tools for collaboration,

149
00:05:36,040 --> 00:05:38,280
and really critical features, like data set versioning.

150
00:05:38,280 --> 00:05:39,720
Oh, versioning is huge.

151
00:05:39,720 --> 00:05:42,480
Anyone who's tried to reproduce an old result knows that pain.

152
00:05:42,480 --> 00:05:43,080
Totally.

153
00:05:43,080 --> 00:05:45,360
Knowing exactly which version of the labels

154
00:05:45,360 --> 00:05:48,040
you used six months ago, yeah, it's crucial.

155
00:05:48,040 --> 00:05:50,100
Plus, things like role-based permissions

156
00:05:50,100 --> 00:05:52,800
to manage who can do what in the annotation team

157
00:05:52,800 --> 00:05:54,240
make sense for project managers.

158
00:05:54,240 --> 00:05:55,920
So it's bridging these two worlds.

159
00:05:55,920 --> 00:05:58,080
And interestingly, it seems very focused

160
00:05:58,080 --> 00:06:00,680
on making it easy to switch to Lightly Studio.

161
00:06:00,680 --> 00:06:03,320
The sources actually call out that it simplifies migrating

162
00:06:03,320 --> 00:06:04,480
data from other tools.

163
00:06:04,480 --> 00:06:05,800
Oh, like competitors.

164
00:06:05,800 --> 00:06:09,280
Yeah, they mention names like Encore, Voxel 51, Ultralytics,

165
00:06:09,280 --> 00:06:13,500
V7 Labs, Roboflow, popular tools in the space.

166
00:06:13,500 --> 00:06:16,960
It seems like they actively want to be that central data hub,

167
00:06:16,960 --> 00:06:19,520
reducing the friction if the team decides, OK,

168
00:06:19,520 --> 00:06:21,480
we need to consolidate onto one platform.

169
00:06:21,480 --> 00:06:22,880
OK, that makes strategic sense.

170
00:06:22,880 --> 00:06:25,160
Now, here's where I think it gets really interesting,

171
00:06:25,160 --> 00:06:27,640
especially for folks wanting to automate things.

172
00:06:27,640 --> 00:06:29,480
The Python interface.

173
00:06:29,480 --> 00:06:31,240
We don't need to become Python experts here.

174
00:06:31,240 --> 00:06:33,760
But understanding the basic concepts

175
00:06:33,760 --> 00:06:37,440
seems key to unlocking that automation power.

176
00:06:37,440 --> 00:06:39,520
What are the main building blocks for a beginner?

177
00:06:39,520 --> 00:06:40,360
Yeah, definitely.

178
00:06:40,360 --> 00:06:42,000
You can think of it like setting up

179
00:06:42,000 --> 00:06:45,880
a big physical filing system makes it easier to grasp.

180
00:06:45,880 --> 00:06:47,780
So the first main concept is the data set.

181
00:06:47,780 --> 00:06:49,360
If you're using the Python interface,

182
00:06:49,360 --> 00:06:51,120
this is sort of your top level thing.

183
00:06:51,120 --> 00:06:52,960
Think of the data set like your main binder

184
00:06:52,960 --> 00:06:54,400
or the whole filing cabinet.

185
00:06:54,400 --> 00:06:56,780
You use it to set up your data, connect to the database file

186
00:06:56,780 --> 00:06:59,440
where everything's stored, kick off the visual interface

187
00:06:59,440 --> 00:07:00,000
if you need it.

188
00:07:00,000 --> 00:07:02,160
And critically, it's what you use

189
00:07:02,160 --> 00:07:03,880
to run your queries and selections

190
00:07:03,880 --> 00:07:05,160
to find specific data.

191
00:07:05,160 --> 00:07:07,320
OK, so data set is the container.

192
00:07:07,320 --> 00:07:08,600
Then what's inside?

193
00:07:08,600 --> 00:07:10,320
Inside, you have the sample.

194
00:07:10,320 --> 00:07:11,880
So if the data set is the binder,

195
00:07:11,880 --> 00:07:15,560
the sample is like a single page or a single file inside it.

196
00:07:15,560 --> 00:07:17,160
It's just one data instance.

197
00:07:17,160 --> 00:07:20,640
Could be one image, one audio clip, whatever.

198
00:07:20,640 --> 00:07:22,440
And what does it know about itself?

199
00:07:22,440 --> 00:07:25,120
It holds all the key info, the unique ID,

200
00:07:25,120 --> 00:07:27,020
like a serial number, the file name

201
00:07:27,020 --> 00:07:29,440
where it lives on the disk, the absolute path,

202
00:07:29,440 --> 00:07:32,260
and importantly, a list of descriptive tags.

203
00:07:32,260 --> 00:07:34,680
Tags could be anything like reviewed,

204
00:07:34,680 --> 00:07:38,560
needs labeling, nighttime, simple labels you attach.

205
00:07:38,560 --> 00:07:40,560
And it also gives you access to the metadata.

206
00:07:40,560 --> 00:07:43,440
That's all the other descriptive stuff about the file image

207
00:07:43,440 --> 00:07:45,240
resolution when it was captured.

208
00:07:45,240 --> 00:07:48,240
Maybe GPS coordinates, depends on the data.

209
00:07:48,240 --> 00:07:50,840
Dataset holds samples, samples hold info and tags.

210
00:07:50,840 --> 00:07:51,840
What's the third piece?

211
00:07:51,840 --> 00:07:55,220
The third piece is the real power move, data set queries.

212
00:07:55,220 --> 00:07:58,440
This is how you find very specific subsets of your data

213
00:07:58,440 --> 00:08:00,760
without looking through potentially millions of files

214
00:08:00,760 --> 00:08:01,640
manually.

215
00:08:01,640 --> 00:08:05,320
Queries let you combine filtering, sorting, slicing,

216
00:08:05,320 --> 00:08:08,220
using standard logic, those Boolean expressions,

217
00:08:08,220 --> 00:08:09,960
and D, or not Dame.

218
00:08:09,960 --> 00:08:12,420
So is this just like filtering columns in a spreadsheet,

219
00:08:12,420 --> 00:08:15,080
or is it more powerful with this kind of data?

220
00:08:15,080 --> 00:08:16,860
It's way more powerful because you

221
00:08:16,860 --> 00:08:21,640
can query based on the tags and the metadata at the same time.

222
00:08:21,640 --> 00:08:23,760
So for instance, you could build a query like,

223
00:08:23,760 --> 00:08:27,440
find all samples that are tagged, needs labeling, or are.

224
00:08:27,440 --> 00:08:30,920
Find samples where the image width is less than 500 pixels

225
00:08:30,920 --> 00:08:34,720
in D, they have not been tagged as reviewed yet.

226
00:08:34,720 --> 00:08:36,800
OK, so you can get really specific to find

227
00:08:36,800 --> 00:08:38,040
potential problems or gaps.

228
00:08:38,040 --> 00:08:38,760
Exactly.

229
00:08:38,760 --> 00:08:40,320
It finds that precise set of data

230
00:08:40,320 --> 00:08:42,120
that maybe slipped through your initial checks

231
00:08:42,120 --> 00:08:43,360
or needs special attention.

232
00:08:43,360 --> 00:08:45,080
That feels like the big win right there,

233
00:08:45,080 --> 00:08:48,440
turning data prep from this manual slog

234
00:08:48,440 --> 00:08:49,840
into something you can script.

235
00:08:49,840 --> 00:08:50,720
Precisely.

236
00:08:50,720 --> 00:08:53,320
And once you run that query, the really useful part

237
00:08:53,320 --> 00:08:55,920
is you can then immediately do something with that subset,

238
00:08:55,920 --> 00:08:59,320
like apply a new tag to all of them, say, needs review.

239
00:08:59,320 --> 00:09:01,000
Then later in the visual interface,

240
00:09:01,000 --> 00:09:03,040
they're super easy to find and work on just

241
00:09:03,040 --> 00:09:04,720
by filtering for that tag.

242
00:09:04,720 --> 00:09:05,280
Nice.

243
00:09:05,280 --> 00:09:06,840
And for beginners, getting data in

244
00:09:06,840 --> 00:09:08,560
seems pretty straightforward, too.

245
00:09:08,560 --> 00:09:10,640
The sources mention easy ways to load data

246
00:09:10,640 --> 00:09:12,480
from common formats, like YOLO.

247
00:09:12,480 --> 00:09:14,760
That's for object detection, right, bounding boxes.

248
00:09:14,760 --> 00:09:15,400
Yeah.

249
00:09:15,400 --> 00:09:19,400
And Cocoa, which is often used for instant segmentation

250
00:09:19,400 --> 00:09:22,600
or image captions, you just use simple Python functions,

251
00:09:22,600 --> 00:09:26,000
like add samples from Milo or add samples from Cocoa.

252
00:09:26,000 --> 00:09:28,680
Keeping the barrier to entry low for common formats.

253
00:09:28,680 --> 00:09:29,300
Good.

254
00:09:29,300 --> 00:09:30,920
Now, this feels like a good transition

255
00:09:30,920 --> 00:09:32,160
to that feature you mentioned earlier,

256
00:09:32,160 --> 00:09:34,920
the one that really shows off the platform's advanced side.

257
00:09:34,920 --> 00:09:35,640
Selection.

258
00:09:35,640 --> 00:09:37,720
You said this is where it saves real money and time.

259
00:09:37,720 --> 00:09:38,220
Yeah.

260
00:09:38,220 --> 00:09:41,440
This is arguably the core IP, the really smart bit.

261
00:09:41,440 --> 00:09:44,640
Selection is basically automated data selection.

262
00:09:44,640 --> 00:09:47,360
And the purpose is simple, but huge.

263
00:09:47,360 --> 00:09:49,800
Save potentially massive labeling costs

264
00:09:49,800 --> 00:09:52,520
and cut down training time while actually improving

265
00:09:52,520 --> 00:09:54,600
your final model quality.

266
00:09:54,600 --> 00:09:55,640
How does that work?

267
00:09:55,640 --> 00:09:57,200
If I have, say, a million images,

268
00:09:57,200 --> 00:09:59,580
but only budget to label a few hundred,

269
00:09:59,580 --> 00:10:00,960
how does it pick the best hundred?

270
00:10:00,960 --> 00:10:01,720
That sounds tricky.

271
00:10:01,720 --> 00:10:02,440
It is tricky.

272
00:10:02,440 --> 00:10:03,800
That's why automation helps.

273
00:10:03,800 --> 00:10:05,880
It avoids human bias and, frankly,

274
00:10:05,880 --> 00:10:08,840
the tediousness of trying to ensure variety manually.

275
00:10:08,840 --> 00:10:10,800
The mechanism works by automatically picking

276
00:10:10,800 --> 00:10:13,040
the samples considered most useful.

277
00:10:13,040 --> 00:10:16,080
And it does this by balancing two key factors that models

278
00:10:16,080 --> 00:10:17,080
need to be robust.

279
00:10:17,080 --> 00:10:18,480
OK, what are the two factors?

280
00:10:18,480 --> 00:10:21,280
First, you need representative samples.

281
00:10:21,280 --> 00:10:23,240
This is your core data, the typical stuff

282
00:10:23,240 --> 00:10:26,360
your model will see 95% of the time, the normal cases.

283
00:10:26,360 --> 00:10:27,560
Right, the bread and butter.

284
00:10:27,560 --> 00:10:29,920
But if you only train on that, your model

285
00:10:29,920 --> 00:10:33,160
falls apart the moment something slightly unusual happens.

286
00:10:33,160 --> 00:10:35,720
So second, you need diverse samples.

287
00:10:35,720 --> 00:10:37,720
These are the crucial edge cases,

288
00:10:37,720 --> 00:10:40,600
the novel or rare examples, the stuff the model hasn't really

289
00:10:40,600 --> 00:10:42,200
seen before but needs to handle.

290
00:10:42,200 --> 00:10:44,960
OK, so it's like, if I'm training a self-driving car

291
00:10:44,960 --> 00:10:47,920
model, I need lots of pictures of normal daytime driving.

292
00:10:47,920 --> 00:10:49,480
That's your representative data.

293
00:10:49,480 --> 00:10:53,320
But I also absolutely need examples

294
00:10:53,320 --> 00:10:56,820
of driving in heavy rain at night, maybe with a weird object

295
00:10:56,820 --> 00:10:57,360
on the road.

296
00:10:57,360 --> 00:10:59,320
Those are your diverse edge case samples.

297
00:10:59,320 --> 00:10:59,880
Exactly.

298
00:10:59,880 --> 00:11:01,640
Selection aims to pick a subset that

299
00:11:01,640 --> 00:11:03,440
intelligently balances both.

300
00:11:03,440 --> 00:11:05,040
So if I just labeled 100 pictures

301
00:11:05,040 --> 00:11:07,960
of the same boring highway on a sunny day,

302
00:11:07,960 --> 00:11:09,800
I've kind of wasted 99 labels.

303
00:11:09,800 --> 00:11:11,080
Selection stops that.

304
00:11:11,080 --> 00:11:11,880
That's the idea.

305
00:11:11,880 --> 00:11:14,760
It forces variety into your labeled set.

306
00:11:14,760 --> 00:11:18,420
And you, the user, get to control this balance.

307
00:11:18,420 --> 00:11:19,920
You can use different strategies.

308
00:11:19,920 --> 00:11:23,300
For example, a metadata weighting strategy.

309
00:11:23,300 --> 00:11:26,160
Maybe you tell it to prioritize samples tag, nighttime,

310
00:11:26,160 --> 00:11:29,040
or rainy, because you know those are hard cases for your model.

311
00:11:29,040 --> 00:11:30,720
OK, using the tags we talked about earlier.

312
00:11:30,720 --> 00:11:31,400
Right.

313
00:11:31,400 --> 00:11:33,880
Or you could use something like an embedding diversity

314
00:11:33,880 --> 00:11:34,960
strategy.

315
00:11:34,960 --> 00:11:36,760
This is more AI driven.

316
00:11:36,760 --> 00:11:38,760
It looks at the actual visual content

317
00:11:38,760 --> 00:11:42,260
using embeddings, numerical representations of the images,

318
00:11:42,260 --> 00:11:45,800
and picks samples that are mathematically distant

319
00:11:45,800 --> 00:11:48,240
or different from what's already been selected,

320
00:11:48,240 --> 00:11:51,120
even if a human didn't explicitly tag it as diverse.

321
00:11:51,120 --> 00:11:53,300
Wow, OK, that sounds powerful.

322
00:11:53,300 --> 00:11:54,960
It leads to some pretty impressive results,

323
00:11:54,960 --> 00:11:56,040
according to the sources.

324
00:11:56,040 --> 00:11:59,480
They cite things like up to an 80% cut in annotation costs.

325
00:11:59,480 --> 00:12:00,240
80%.

326
00:12:00,240 --> 00:12:00,800
Yeah.

327
00:12:00,800 --> 00:12:03,200
And model iteration cycles getting three times faster.

328
00:12:03,200 --> 00:12:05,680
Plus, actual accuracy bumps, sometimes 10%.

329
00:12:05,680 --> 00:12:08,160
Sometimes even sizes as high as 36%.

330
00:12:08,160 --> 00:12:09,560
That's significant.

331
00:12:09,560 --> 00:12:11,960
It really underlines that focusing on data quality

332
00:12:11,960 --> 00:12:14,800
over just raw quantity pays off.

333
00:12:14,800 --> 00:12:15,560
Absolutely.

334
00:12:15,560 --> 00:12:18,120
The return on investment for curating your data smartly

335
00:12:18,120 --> 00:12:19,320
is undeniable.

336
00:12:19,320 --> 00:12:22,160
So just to wrap up the main idea for the listener,

337
00:12:22,160 --> 00:12:24,960
Lightly Studio positions itself as that central data hub.

338
00:12:24,960 --> 00:12:27,360
It brings together the management, the labeling,

339
00:12:27,360 --> 00:12:31,240
and this intelligent automated curation using selection.

340
00:12:31,240 --> 00:12:33,040
And it also links up with their other tools,

341
00:12:33,040 --> 00:12:35,080
like LightlyTrain for pre-training models

342
00:12:35,080 --> 00:12:37,360
and LightlyEdge for optimizing data collection out

343
00:12:37,360 --> 00:12:38,280
in the field.

344
00:12:38,280 --> 00:12:40,660
So you get the open source flexibility and cost benefits,

345
00:12:40,660 --> 00:12:43,560
but paired with enterprise level security rigor.

346
00:12:43,560 --> 00:12:46,520
They mentioned being ISO 27001 certified,

347
00:12:46,520 --> 00:12:48,200
which is important for businesses.

348
00:12:48,200 --> 00:12:51,160
All right, that security aspect is key for adoption.

349
00:12:51,160 --> 00:12:53,200
Okay, that level of control over the data

350
00:12:53,200 --> 00:12:54,720
brings us to our final thought for you,

351
00:12:54,720 --> 00:12:56,640
the listener, to maybe chew on.

352
00:12:56,640 --> 00:12:58,040
We tend to get really fixated on picking

353
00:12:58,040 --> 00:12:59,680
the perfect model architecture,

354
00:12:59,680 --> 00:13:01,720
tweaking the training parameters,

355
00:13:01,720 --> 00:13:03,880
but maybe the real leverage, the real power,

356
00:13:03,880 --> 00:13:05,920
actually lies in data curation.

357
00:13:05,920 --> 00:13:07,740
So here's the question.

358
00:13:07,740 --> 00:13:10,200
If you could only afford to label, say,

359
00:13:10,200 --> 00:13:13,960
100 images for your next big computer vision project,

360
00:13:13,960 --> 00:13:16,320
how confident are you right now

361
00:13:16,320 --> 00:13:19,320
that those specific 100 images would be the absolute best,

362
00:13:19,320 --> 00:13:22,080
most useful, most diverse set possible

363
00:13:22,080 --> 00:13:24,380
to train your model effectively?

364
00:13:24,380 --> 00:13:26,700
Tools like Lightly Studio are fundamentally designed

365
00:13:26,700 --> 00:13:28,160
to take the guesswork out of that question

366
00:13:28,160 --> 00:13:30,520
and give you some actual measurable certainty,

367
00:13:30,520 --> 00:13:32,000
something to think about.

368
00:13:32,000 --> 00:13:33,520
Well, thank you for joining us for this deep dive

369
00:13:33,520 --> 00:13:36,000
into data management for multimodal ML.

370
00:13:36,000 --> 00:13:37,460
And just one final reminder

371
00:13:37,460 --> 00:13:39,640
that this deep dive was supported by Safe Server.

372
00:13:39,640 --> 00:13:41,760
Safe Server supports your digital transformation

373
00:13:41,760 --> 00:13:43,800
and handles hosting for this kind of software.

374
00:13:43,800 --> 00:13:47,840
You can find out more at www.safeserver.de.

375
00:13:47,840 --> 00:13:49,880
We'll see you next time on the deep dive.