1
00:00:00,000 --> 00:00:04,080
Okay, let's unpack this. If you're working with AI agents, you've probably run

2
00:00:04,080 --> 00:00:05,040
smack into the trust

3
00:00:05,040 --> 00:00:09,520
barrier. We're talking about that fundamental problem with large language models,

4
00:00:09,520 --> 00:00:10,960
the dreaded

5
00:00:10,960 --> 00:00:16,510
hallucination where the AI just invents stuff. Yeah, invents facts. And it's more

6
00:00:16,510 --> 00:00:17,040
than just

7
00:00:17,040 --> 00:00:20,480
annoying, right? It's a huge challenge if your agent needs to know about your

8
00:00:20,480 --> 00:00:21,360
specific,

9
00:00:21,360 --> 00:00:26,560
maybe internal knowledge. Exactly. So today we're doing a deep dive into the tech

10
00:00:26,560 --> 00:00:27,360
built to fix

11
00:00:27,360 --> 00:00:33,470
this trust crisis, Retrieval Augmented Generation, RAGRI, for short. Ah. But before

12
00:00:33,470 --> 00:00:33,920
we really get

13
00:00:33,920 --> 00:00:37,680
into the weeds of grounding these agents, we really want to thank SafeServer. Ah,

14
00:00:37,680 --> 00:00:38,560
yes. They

15
00:00:38,560 --> 00:00:43,140
focus on hosting exactly this kind of complex, cutting edge software. They're all

16
00:00:43,140 --> 00:00:43,760
about supporting

17
00:00:43,760 --> 00:00:46,560
your digital transformation journey, making sure you've got the right setup for

18
00:00:46,560 --> 00:00:47,920
advanced RAG apps.

19
00:00:47,920 --> 00:00:52,000
You can find out more about how they help with hosting over at www.safeserver.de.

20
00:00:52,880 --> 00:00:57,070
So, our mission today, to give you a crucial shortcut, we're going to demystify

21
00:00:57,070 --> 00:00:57,600
this platform

22
00:00:57,600 --> 00:01:01,310
called AgentSet. It's designed so pretty much anyone can build these really

23
00:01:01,310 --> 00:01:02,400
reliable, traceable

24
00:01:02,400 --> 00:01:08,030
frontier RAG apps. We'll break down how it works so even if you're totally new to

25
00:01:08,030 --> 00:01:08,400
RAG,

26
00:01:08,400 --> 00:01:12,080
you'll get why it promises to skip all that painful, expensive trial and error you

27
00:01:12,080 --> 00:01:12,800
often see.

28
00:01:12,800 --> 00:01:16,880
And that's key, focusing on beginners too. Because RAG, well, fundamentally it's

29
00:01:16,880 --> 00:01:17,920
about giving the AI

30
00:01:17,920 --> 00:01:21,650
proof. You ground the agent in a specific knowledge base so it stops being a

31
00:01:21,650 --> 00:01:22,880
general know-it-all and

32
00:01:22,880 --> 00:01:26,710
becomes an expert on your stuff, your documents. We're looking at a system designed

33
00:01:26,710 --> 00:01:27,360
to make that

34
00:01:27,360 --> 00:01:32,610
really complex engineering job more plug-and-play. Right, reliable answers right

35
00:01:32,610 --> 00:01:33,600
out of the box.

36
00:01:33,600 --> 00:01:37,440
So let's start right there with that fundamental difference. The promise is

37
00:01:37,440 --> 00:01:38,800
building reliable AI

38
00:01:38,800 --> 00:01:43,470
agents fast, cutting down hallucinations and, you know, impressing people from the

39
00:01:43,470 --> 00:01:44,160
get-go.

40
00:01:44,160 --> 00:01:47,190
Yeah, and it's interesting to think about the pain point Agentset is trying to

41
00:01:47,190 --> 00:01:48,160
solve here. If you,

42
00:01:48,160 --> 00:01:53,120
the listener, tried the DIY route, maybe using tools like Langchain or Lamma Index,

43
00:01:53,120 --> 00:01:57,090
the source material suggests you hit a wall pretty fast, a steep learning curve,

44
00:01:57,090 --> 00:01:58,320
complex setup,

45
00:01:58,320 --> 00:02:02,880
loads of boilerplate code, and maybe worst of all, the retrieval quality. It's just

46
00:02:02,880 --> 00:02:05,520
all over the place. Inconsistent.

47
00:02:05,520 --> 00:02:09,680
Let's pause on that inconsistency. What does that complexity actually mean for an

48
00:02:09,680 --> 00:02:10,000
engineer?

49
00:02:10,000 --> 00:02:12,000
It's not just like calling one API, is it?

50
00:02:12,000 --> 00:02:15,910
Oh, absolutely not. No. The trouble starts immediately with getting the documents

51
00:02:15,910 --> 00:02:16,160
in and

52
00:02:16,160 --> 00:02:20,240
shopping them up. Ingestion and chunking. When you build our Ag yourself, you have

53
00:02:20,240 --> 00:02:20,880
to figure out,

54
00:02:20,880 --> 00:02:25,520
okay, how do I break this huge document into pieces? Small enough for the LLM,

55
00:02:25,520 --> 00:02:27,760
but big enough to keep the meaning. Right.

56
00:02:27,760 --> 00:02:33,840
Do I use paragraphs, fixed number of words, some recursive method? You choose wrong,

57
00:02:33,840 --> 00:02:37,840
and the whole retrieval thing can just fail. It's a huge decision that,

58
00:02:37,840 --> 00:02:40,560
if you're doing it yourself, needs tons of tuning and testing.

59
00:02:40,560 --> 00:02:44,640
An agent set comes in and basically says, look, we've got a ready-to-use engine.

60
00:02:44,640 --> 00:02:47,600
It handles that complexity, those architectural choices,

61
00:02:47,600 --> 00:02:51,520
right away. And it starts with ingestion. Exactly. Your documents, your knowledge,

62
00:02:51,520 --> 00:02:54,800
get automatically parsed. And from over 22 file formats.

63
00:02:54,800 --> 00:02:57,200
That's a lot. It is. And it's important because it's not

64
00:02:57,200 --> 00:03:01,360
just the easy ones like PDF and Word docs. It includes tricky ones like

65
00:03:01,360 --> 00:03:08,560
EML emails, CSVs, even image files like BMP and complex XMLs. That breadth alone

66
00:03:08,560 --> 00:03:12,240
solves a huge integration headache for developers dealing with messy corporate

67
00:03:12,240 --> 00:03:15,280
data silos. Okay, so the documents are in. How does

68
00:03:15,280 --> 00:03:20,240
the system prep them for retrieval? The platform uses its own built-in chunking

69
00:03:20,240 --> 00:03:22,880
strategy. It automatically breaks everything down

70
00:03:22,880 --> 00:03:26,640
into these manageable searchable bits trying to maximize the context.

71
00:03:26,640 --> 00:03:30,720
Then they get embedded, turned into numbers basically, vectors and stored in

72
00:03:30,720 --> 00:03:34,080
a vector database. This makes finding them later really fast

73
00:03:34,080 --> 00:03:37,360
using math, that whole ingestion and prep stage. That's

74
00:03:37,360 --> 00:03:41,120
where a lot of DIY rag projects stumble because of bad choices early on.

75
00:03:41,120 --> 00:03:45,040
AgentSet aims to make those good choices for you. Okay, let's move to section two

76
00:03:45,040 --> 00:03:48,080
then, the core function. How does AgentSet actually guarantee

77
00:03:48,080 --> 00:03:50,720
accuracy and fight off those hallucinations?

78
00:03:50,720 --> 00:03:54,720
This is really key if you need dependable enterprise-grade answers.

79
00:03:54,720 --> 00:03:59,120
Right. They aim for reliable accuracy by using what the sources call

80
00:03:59,120 --> 00:04:02,640
best-in-class R-RAID techniques right from the start.

81
00:04:02,640 --> 00:04:05,680
They've essentially optimized the retrieval bit before you even think

82
00:04:05,680 --> 00:04:08,800
about customizing anything. Two features really jumped out at me

83
00:04:08,800 --> 00:04:12,640
from the material, hybrid search and re-ranking.

84
00:04:12,640 --> 00:04:16,080
Let's unpack why these are like safety nets against bad answers.

85
00:04:16,080 --> 00:04:19,280
Okay, hybrid search is kind of the proactive step. See,

86
00:04:19,280 --> 00:04:23,280
basic vector search is good at finding stuff that's semantically similar chunks

87
00:04:23,280 --> 00:04:25,600
talking about the same topic. Getting similar, yeah.

88
00:04:25,600 --> 00:04:29,040
But similar meaning doesn't always mean it's the right context for the specific

89
00:04:29,040 --> 00:04:32,160
question. Hybrid search casts a wider net. It

90
00:04:32,160 --> 00:04:36,960
combines that vector search with good old keyword and full text search.

91
00:04:36,960 --> 00:04:40,160
This finds more potentially relevant bits of information,

92
00:04:40,160 --> 00:04:44,160
making sure something isn't missed just because the vector math was slightly off.

93
00:04:44,160 --> 00:04:47,280
And then re-ranking. That's the quality control, right?

94
00:04:47,280 --> 00:04:50,560
Hybrid search finds maybe a thousand relevant looking chunks.

95
00:04:50,560 --> 00:04:54,800
How does the system pick the best three to actually show the LLM?

96
00:04:54,800 --> 00:04:58,400
Precisely. Re-ranking is like the final editor.

97
00:04:58,400 --> 00:05:02,640
It takes all those candidates from the hybrid search and sorts them based on true

98
00:05:02,640 --> 00:05:03,360
relevance and

99
00:05:03,360 --> 00:05:08,650
quality. It ensures the absolute best, most contextually spot-on material gets

100
00:05:08,650 --> 00:05:09,040
passed to

101
00:05:09,040 --> 00:05:13,120
the large language model. That's how you get the highest accuracy by cleaning up

102
00:05:13,120 --> 00:05:13,840
the retrieved

103
00:05:13,840 --> 00:05:18,480
information before the LLM even sees it. That's a critical distinction. The AI isn't

104
00:05:18,480 --> 00:05:22,240
just grabbing nearby stuff. It's prioritizing the quality of the evidence.

105
00:05:22,240 --> 00:05:26,720
Exactly. And they add another layer, too. Built-in support for deep research.

106
00:05:27,360 --> 00:05:32,800
You can choose quick answer or a deeper dive. Deep research takes longer, naturally,

107
00:05:32,800 --> 00:05:36,960
but it looks at way more sources and gives back really in-depth answers with more

108
00:05:36,960 --> 00:05:38,000
context.

109
00:05:38,000 --> 00:05:40,320
Great for complex questions or high stakes decisions.

110
00:05:40,320 --> 00:05:43,600
That may be the most vital feature for building trust with the person asking the

111
00:05:43,600 --> 00:05:44,240
question.

112
00:05:44,240 --> 00:05:45,280
Citations.

113
00:05:45,280 --> 00:05:50,340
Absolutely non-negotiable, usually. The system automatically cites the exact

114
00:05:50,340 --> 00:05:51,040
sources for its

115
00:05:51,040 --> 00:05:55,510
answers. This lets you, the user, click through and see the original document, the

116
00:05:55,510 --> 00:05:56,160
page, even

117
00:05:56,160 --> 00:06:00,570
the paragraph where the AI got its info. In a business setting, that traceability

118
00:06:00,570 --> 00:06:01,120
is essential

119
00:06:01,120 --> 00:06:03,040
for compliance, for validation.

120
00:06:03,040 --> 00:06:06,560
And building on that control idea, there's metadata filtering.

121
00:06:06,560 --> 00:06:09,360
Give us a quick, practical example of why that matters.

122
00:06:09,360 --> 00:06:14,800
Sure. So this lets you limit the AI's answers to only a specific slice of your data

123
00:06:14,800 --> 00:06:15,280
based on

124
00:06:15,280 --> 00:06:19,520
tags you added when you uploaded the documents. Imagine a big company. You might

125
00:06:19,520 --> 00:06:20,160
need an agent

126
00:06:20,160 --> 00:06:26,510
that only answers using documents tagged Legal 2023 Q4 to make sure it's compliant,

127
00:06:26,510 --> 00:06:27,120
maybe excluding

128
00:06:27,120 --> 00:06:30,760
marketing stuff entirely. It keeps the agent operating within very specific

129
00:06:30,760 --> 00:06:32,000
boundaries. Again,

130
00:06:32,000 --> 00:06:35,280
making sure the answers are traceable and vetted from your chosen sources.

131
00:06:35,280 --> 00:06:39,150
Okay. That level of reliability, that optimized architecture, that usually takes a

132
00:06:39,150 --> 00:06:39,680
dedicated

133
00:06:39,680 --> 00:06:44,340
expense of engineering team. But agents had to talk about production in hours. So

134
00:06:44,340 --> 00:06:44,480
let's

135
00:06:44,480 --> 00:06:47,520
shift to developer experience and flexibility section three.

136
00:06:47,520 --> 00:06:53,200
Right. If we pivot from accuracy to just making it easy to implement, what's really

137
00:06:53,200 --> 00:06:53,680
important for

138
00:06:53,680 --> 00:06:58,830
scaling is how accessible they've made deployment. They offer ready to go SDKs for

139
00:06:58,830 --> 00:06:59,440
JavaScript and

140
00:06:59,440 --> 00:07:05,360
Python, clean APIs, typed SDKs too. This means developers can upload data and plug

141
00:07:05,360 --> 00:07:05,760
it into

142
00:07:05,760 --> 00:07:11,280
existing systems fast without wrestling with, you know, messy or undocumented code.

143
00:07:11,280 --> 00:07:15,280
Okay. But let's be a bit skeptical. If agent set is prepackaging all these fancy

144
00:07:15,920 --> 00:07:20,030
rogue techniques, what's the catch? Am I locked into their way of doing things?

145
00:07:20,030 --> 00:07:21,120
Their cloud,

146
00:07:21,120 --> 00:07:25,040
their choice of AI model. Excellent question. And the source material really

147
00:07:25,040 --> 00:07:25,840
stresses this.

148
00:07:25,840 --> 00:07:30,800
Agent set is extremely model agnostic. You are specifically not locked into one

149
00:07:30,800 --> 00:07:31,680
vendor's AI.

150
00:07:31,680 --> 00:07:36,080
That's a huge strategic plus. You keep control, you pick your own vector database,

151
00:07:36,080 --> 00:07:39,280
your own embedding model, and critically your own large language model.

152
00:07:39,280 --> 00:07:43,200
And that's not just a tech detail, is it? That's about cost, it's about strategy.

153
00:07:43,200 --> 00:07:46,080
Absolutely. It lets you fine tune performance and manage your budget.

154
00:07:46,080 --> 00:07:50,240
Maybe you use a powerful pricey model like GPT-4 for that deep research feature.

155
00:07:50,240 --> 00:07:55,530
Or maybe a cheaper, faster model from Anthropic or Cohere for everyday customer

156
00:07:55,530 --> 00:07:56,320
questions.

157
00:07:56,320 --> 00:08:00,400
Agent set works with all the big players. Open AI, Anthropic, Google AI,

158
00:08:00,400 --> 00:08:04,960
Cohere. You keep control of your underlying tech stack, avoid getting locked in.

159
00:08:04,960 --> 00:08:08,000
That makes a lot of sense. Now once I've got agent set running,

160
00:08:08,000 --> 00:08:11,040
how do I actually connect that knowledge base to other applications?

161
00:08:11,600 --> 00:08:14,320
So for bringing that knowledge out, they offer a couple of ways.

162
00:08:14,320 --> 00:08:19,760
One is just easy integration using their standard AI SDK. Pretty straightforward.

163
00:08:19,760 --> 00:08:24,160
But for connecting that powerful, grounded knowledge base to external apps or maybe

164
00:08:24,160 --> 00:08:28,360
microservices, they have something called the Model Context Protocol Server or MCP

165
00:08:28,360 --> 00:08:28,960
server.

166
00:08:28,960 --> 00:08:31,600
Okay. What does the MCP server do exactly?

167
00:08:31,600 --> 00:08:34,560
Think of it like a secure gateway, a dedicated one.

168
00:08:34,560 --> 00:08:38,000
It lets your other applications query the agent set knowledge base,

169
00:08:38,000 --> 00:08:42,000
that whole sophisticated RRAG engine, without needing to rebuild the retrieval and

170
00:08:42,000 --> 00:08:42,880
LLM logic

171
00:08:42,880 --> 00:08:47,200
themselves. It essentially serves up the contextual proof, ready for any external

172
00:08:47,200 --> 00:08:51,280
app to use to generate a reliable answer. And to help developers get going faster,

173
00:08:51,280 --> 00:08:56,000
they even threw in a chat playground. Yes. With message editing and citations built

174
00:08:56,000 --> 00:09:00,240
right in, it's brilliant for just quickly trying things out, prototyping.

175
00:09:00,240 --> 00:09:04,100
Developers can immediately see if the RIG is working well, check the accuracy of

176
00:09:04,100 --> 00:09:04,560
answers,

177
00:09:04,560 --> 00:09:08,720
without having to push anything live, it cuts down testing time dramatically.

178
00:09:08,720 --> 00:09:14,620
Okay, let's move to section four. This is huge. Security and control. Especially

179
00:09:14,620 --> 00:09:14,960
when we're talking

180
00:09:14,960 --> 00:09:19,860
about grounding AI in sensitive proprietary company data. What protections are

181
00:09:19,860 --> 00:09:20,320
baked in?

182
00:09:20,320 --> 00:09:23,280
How do they build that trust? Security seems to be multilayered,

183
00:09:23,280 --> 00:09:27,440
often aiming higher than standard practice. Your data is secured with end-to-end

184
00:09:27,440 --> 00:09:28,320
encryption using

185
00:09:28,320 --> 00:09:34,140
bank-grade AES-256. And all the data moving around is secured using TLS. Standard,

186
00:09:34,140 --> 00:09:35,360
but essential.

187
00:09:35,360 --> 00:09:38,960
But the real decider for many companies is control. Who actually owns the data?

188
00:09:38,960 --> 00:09:41,440
Who controls the infrastructure? Where does it live?

189
00:09:41,440 --> 00:09:45,600
Absolutely. And the key differentiator here seems to be flexibility and hosting

190
00:09:45,600 --> 00:09:46,400
control.

191
00:09:46,400 --> 00:09:50,080
The platform lets users host your data on top of your own vector database,

192
00:09:50,080 --> 00:09:54,160
your own storage bucket, and your own chosen AI models. For sensitive systems,

193
00:09:54,160 --> 00:09:57,200
keeping ownership and control over that data stack is paramount.

194
00:09:57,200 --> 00:10:00,240
And what about organizations with really strict rules,

195
00:10:00,240 --> 00:10:04,560
especially around where data can physically be? Data residency requirements?

196
00:10:04,560 --> 00:10:10,000
Yep. For compliance needs, Agentset offers specific options for EU data residency.

197
00:10:10,000 --> 00:10:13,840
That means ensuring data is processed only on servers within the EU,

198
00:10:13,840 --> 00:10:17,520
which ticks a big box for GDPR and similar regulations.

199
00:10:17,520 --> 00:10:21,120
And for the absolute maximum control, maybe for finance or healthcare,

200
00:10:21,120 --> 00:10:23,440
they support on-premise deployment.

201
00:10:23,440 --> 00:10:28,000
So on-prem means putting the whole Agentset system behind your own company firewall,

202
00:10:28,000 --> 00:10:28,960
in your own cloud.

203
00:10:28,960 --> 00:10:33,040
That's exactly it. It lets you deploy Agentset inside your existing cloud

204
00:10:33,040 --> 00:10:33,600
environment,

205
00:10:33,600 --> 00:10:39,520
AWS, Azure, GCP, whatever, but completely under your security rules behind your

206
00:10:39,520 --> 00:10:40,320
firewalls.

207
00:10:40,320 --> 00:10:43,920
That's the ultimate level of control for really critical applications.

208
00:10:43,920 --> 00:10:45,680
So it sounds like if you're looking to adopt this,

209
00:10:45,680 --> 00:10:48,720
there are two pretty clear paths depending on your needs.

210
00:10:48,720 --> 00:10:52,240
Yeah, basically. First, there's Agentset Cloud. That's the quickest way to get

211
00:10:52,240 --> 00:10:52,560
started,

212
00:10:52,560 --> 00:10:57,440
kick the tires. Crucially, they have a generous free tier, 1,000 pages of documents

213
00:10:57,440 --> 00:10:58,400
you can ingest,

214
00:10:58,400 --> 00:11:02,560
10,000 retrievals. Makes it really easy to experiment without commitment.

215
00:11:02,560 --> 00:11:06,400
And then, for those who need that total control, you can self-host.

216
00:11:06,400 --> 00:11:11,600
Right. Because Agentset is open source MIT license, which is very permissive. You

217
00:11:11,600 --> 00:11:11,680
can

218
00:11:11,680 --> 00:11:15,520
just download the whole thing and run it yourself. That gives you absolute maximum

219
00:11:15,520 --> 00:11:16,880
technical control

220
00:11:16,880 --> 00:11:21,440
over the entire RAG setup, the knowledge, the infrastructure. It's really a choice

221
00:11:21,440 --> 00:11:22,240
between

222
00:11:22,240 --> 00:11:25,760
getting to market fast versus having complete uncompromised control.

223
00:11:25,760 --> 00:11:31,280
So, to kind of wrap up the key takeaway for you, the listener, Agentset takes the

224
00:11:31,280 --> 00:11:32,160
really painful

225
00:11:32,160 --> 00:11:36,480
complex bits of RAG getting diverse data in, figuring out chunking, doing advanced

226
00:11:36,480 --> 00:11:36,880
retrieval

227
00:11:36,880 --> 00:11:41,120
like hybrid search, adding citations and packages at all into an accessible engine

228
00:11:41,120 --> 00:11:41,760
that's accurate

229
00:11:41,760 --> 00:11:45,470
out of the box. It essentially lets you skip building that whole complex

230
00:11:45,470 --> 00:11:46,560
infrastructure from

231
00:11:46,560 --> 00:11:50,240
scratch. You bypass the RAG engineering headaches and can focus straight away on

232
00:11:50,240 --> 00:11:51,120
building a reliable

233
00:11:51,120 --> 00:11:54,720
AI app you can actually trust. And here's where it gets really interesting for me,

234
00:11:54,720 --> 00:11:59,760
the final thought. Given how easy they're making deployment, the model flexibility,

235
00:11:59,760 --> 00:12:03,780
these powerful accuracy features like hybrid search and deep research, how quickly

236
00:12:03,780 --> 00:12:04,240
will these

237
00:12:04,240 --> 00:12:08,800
traceable knowledge-backed AI agents just become the standard? Will they totally

238
00:12:08,800 --> 00:12:10,240
replace the simpler,

239
00:12:10,240 --> 00:12:13,760
less reliable chat bots that, you know, can't actually prove where they got their

240
00:12:13,760 --> 00:12:14,720
answers from

241
00:12:14,720 --> 00:12:19,120
using your specific documents? It feels like the demand for trust is making traceability

242
00:12:19,120 --> 00:12:23,520
basically mandatory. A huge thank you once again to SafeServer for supporting this

243
00:12:23,520 --> 00:12:24,320
deep dive and

244
00:12:24,320 --> 00:12:28,070
enabling digital transformation. If you want to explore hosting solutions for this

245
00:12:28,070 --> 00:12:28,480
kind of

246
00:12:28,480 --> 00:12:34,800
advanced software, please do visit www.safeserver.de. And thank you for joining us

247
00:12:34,800 --> 00:12:35,600
for this deep dive.

248
00:12:35,600 --> 00:12:37,840
Go forth and ground your agents.