1 00:00:00,000 --> 00:00:04,080 Okay, let's unpack this. If you're working with AI agents, you've probably run 2 00:00:04,080 --> 00:00:05,040 smack into the trust 3 00:00:05,040 --> 00:00:09,520 barrier. We're talking about that fundamental problem with large language models, 4 00:00:09,520 --> 00:00:10,960 the dreaded 5 00:00:10,960 --> 00:00:16,510 hallucination where the AI just invents stuff. Yeah, invents facts. And it's more 6 00:00:16,510 --> 00:00:17,040 than just 7 00:00:17,040 --> 00:00:20,480 annoying, right? It's a huge challenge if your agent needs to know about your 8 00:00:20,480 --> 00:00:21,360 specific, 9 00:00:21,360 --> 00:00:26,560 maybe internal knowledge. Exactly. So today we're doing a deep dive into the tech 10 00:00:26,560 --> 00:00:27,360 built to fix 11 00:00:27,360 --> 00:00:33,470 this trust crisis, Retrieval Augmented Generation, RAGRI, for short. Ah. But before 12 00:00:33,470 --> 00:00:33,920 we really get 13 00:00:33,920 --> 00:00:37,680 into the weeds of grounding these agents, we really want to thank SafeServer. Ah, 14 00:00:37,680 --> 00:00:38,560 yes. They 15 00:00:38,560 --> 00:00:43,140 focus on hosting exactly this kind of complex, cutting edge software. They're all 16 00:00:43,140 --> 00:00:43,760 about supporting 17 00:00:43,760 --> 00:00:46,560 your digital transformation journey, making sure you've got the right setup for 18 00:00:46,560 --> 00:00:47,920 advanced RAG apps. 19 00:00:47,920 --> 00:00:52,000 You can find out more about how they help with hosting over at www.safeserver.de. 20 00:00:52,880 --> 00:00:57,070 So, our mission today, to give you a crucial shortcut, we're going to demystify 21 00:00:57,070 --> 00:00:57,600 this platform 22 00:00:57,600 --> 00:01:01,310 called AgentSet. It's designed so pretty much anyone can build these really 23 00:01:01,310 --> 00:01:02,400 reliable, traceable 24 00:01:02,400 --> 00:01:08,030 frontier RAG apps. We'll break down how it works so even if you're totally new to 25 00:01:08,030 --> 00:01:08,400 RAG, 26 00:01:08,400 --> 00:01:12,080 you'll get why it promises to skip all that painful, expensive trial and error you 27 00:01:12,080 --> 00:01:12,800 often see. 28 00:01:12,800 --> 00:01:16,880 And that's key, focusing on beginners too. Because RAG, well, fundamentally it's 29 00:01:16,880 --> 00:01:17,920 about giving the AI 30 00:01:17,920 --> 00:01:21,650 proof. You ground the agent in a specific knowledge base so it stops being a 31 00:01:21,650 --> 00:01:22,880 general know-it-all and 32 00:01:22,880 --> 00:01:26,710 becomes an expert on your stuff, your documents. We're looking at a system designed 33 00:01:26,710 --> 00:01:27,360 to make that 34 00:01:27,360 --> 00:01:32,610 really complex engineering job more plug-and-play. Right, reliable answers right 35 00:01:32,610 --> 00:01:33,600 out of the box. 36 00:01:33,600 --> 00:01:37,440 So let's start right there with that fundamental difference. The promise is 37 00:01:37,440 --> 00:01:38,800 building reliable AI 38 00:01:38,800 --> 00:01:43,470 agents fast, cutting down hallucinations and, you know, impressing people from the 39 00:01:43,470 --> 00:01:44,160 get-go. 40 00:01:44,160 --> 00:01:47,190 Yeah, and it's interesting to think about the pain point Agentset is trying to 41 00:01:47,190 --> 00:01:48,160 solve here. If you, 42 00:01:48,160 --> 00:01:53,120 the listener, tried the DIY route, maybe using tools like Langchain or Lamma Index, 43 00:01:53,120 --> 00:01:57,090 the source material suggests you hit a wall pretty fast, a steep learning curve, 44 00:01:57,090 --> 00:01:58,320 complex setup, 45 00:01:58,320 --> 00:02:02,880 loads of boilerplate code, and maybe worst of all, the retrieval quality. It's just 46 00:02:02,880 --> 00:02:05,520 all over the place. Inconsistent. 47 00:02:05,520 --> 00:02:09,680 Let's pause on that inconsistency. What does that complexity actually mean for an 48 00:02:09,680 --> 00:02:10,000 engineer? 49 00:02:10,000 --> 00:02:12,000 It's not just like calling one API, is it? 50 00:02:12,000 --> 00:02:15,910 Oh, absolutely not. No. The trouble starts immediately with getting the documents 51 00:02:15,910 --> 00:02:16,160 in and 52 00:02:16,160 --> 00:02:20,240 shopping them up. Ingestion and chunking. When you build our Ag yourself, you have 53 00:02:20,240 --> 00:02:20,880 to figure out, 54 00:02:20,880 --> 00:02:25,520 okay, how do I break this huge document into pieces? Small enough for the LLM, 55 00:02:25,520 --> 00:02:27,760 but big enough to keep the meaning. Right. 56 00:02:27,760 --> 00:02:33,840 Do I use paragraphs, fixed number of words, some recursive method? You choose wrong, 57 00:02:33,840 --> 00:02:37,840 and the whole retrieval thing can just fail. It's a huge decision that, 58 00:02:37,840 --> 00:02:40,560 if you're doing it yourself, needs tons of tuning and testing. 59 00:02:40,560 --> 00:02:44,640 An agent set comes in and basically says, look, we've got a ready-to-use engine. 60 00:02:44,640 --> 00:02:47,600 It handles that complexity, those architectural choices, 61 00:02:47,600 --> 00:02:51,520 right away. And it starts with ingestion. Exactly. Your documents, your knowledge, 62 00:02:51,520 --> 00:02:54,800 get automatically parsed. And from over 22 file formats. 63 00:02:54,800 --> 00:02:57,200 That's a lot. It is. And it's important because it's not 64 00:02:57,200 --> 00:03:01,360 just the easy ones like PDF and Word docs. It includes tricky ones like 65 00:03:01,360 --> 00:03:08,560 EML emails, CSVs, even image files like BMP and complex XMLs. That breadth alone 66 00:03:08,560 --> 00:03:12,240 solves a huge integration headache for developers dealing with messy corporate 67 00:03:12,240 --> 00:03:15,280 data silos. Okay, so the documents are in. How does 68 00:03:15,280 --> 00:03:20,240 the system prep them for retrieval? The platform uses its own built-in chunking 69 00:03:20,240 --> 00:03:22,880 strategy. It automatically breaks everything down 70 00:03:22,880 --> 00:03:26,640 into these manageable searchable bits trying to maximize the context. 71 00:03:26,640 --> 00:03:30,720 Then they get embedded, turned into numbers basically, vectors and stored in 72 00:03:30,720 --> 00:03:34,080 a vector database. This makes finding them later really fast 73 00:03:34,080 --> 00:03:37,360 using math, that whole ingestion and prep stage. That's 74 00:03:37,360 --> 00:03:41,120 where a lot of DIY rag projects stumble because of bad choices early on. 75 00:03:41,120 --> 00:03:45,040 AgentSet aims to make those good choices for you. Okay, let's move to section two 76 00:03:45,040 --> 00:03:48,080 then, the core function. How does AgentSet actually guarantee 77 00:03:48,080 --> 00:03:50,720 accuracy and fight off those hallucinations? 78 00:03:50,720 --> 00:03:54,720 This is really key if you need dependable enterprise-grade answers. 79 00:03:54,720 --> 00:03:59,120 Right. They aim for reliable accuracy by using what the sources call 80 00:03:59,120 --> 00:04:02,640 best-in-class R-RAID techniques right from the start. 81 00:04:02,640 --> 00:04:05,680 They've essentially optimized the retrieval bit before you even think 82 00:04:05,680 --> 00:04:08,800 about customizing anything. Two features really jumped out at me 83 00:04:08,800 --> 00:04:12,640 from the material, hybrid search and re-ranking. 84 00:04:12,640 --> 00:04:16,080 Let's unpack why these are like safety nets against bad answers. 85 00:04:16,080 --> 00:04:19,280 Okay, hybrid search is kind of the proactive step. See, 86 00:04:19,280 --> 00:04:23,280 basic vector search is good at finding stuff that's semantically similar chunks 87 00:04:23,280 --> 00:04:25,600 talking about the same topic. Getting similar, yeah. 88 00:04:25,600 --> 00:04:29,040 But similar meaning doesn't always mean it's the right context for the specific 89 00:04:29,040 --> 00:04:32,160 question. Hybrid search casts a wider net. It 90 00:04:32,160 --> 00:04:36,960 combines that vector search with good old keyword and full text search. 91 00:04:36,960 --> 00:04:40,160 This finds more potentially relevant bits of information, 92 00:04:40,160 --> 00:04:44,160 making sure something isn't missed just because the vector math was slightly off. 93 00:04:44,160 --> 00:04:47,280 And then re-ranking. That's the quality control, right? 94 00:04:47,280 --> 00:04:50,560 Hybrid search finds maybe a thousand relevant looking chunks. 95 00:04:50,560 --> 00:04:54,800 How does the system pick the best three to actually show the LLM? 96 00:04:54,800 --> 00:04:58,400 Precisely. Re-ranking is like the final editor. 97 00:04:58,400 --> 00:05:02,640 It takes all those candidates from the hybrid search and sorts them based on true 98 00:05:02,640 --> 00:05:03,360 relevance and 99 00:05:03,360 --> 00:05:08,650 quality. It ensures the absolute best, most contextually spot-on material gets 100 00:05:08,650 --> 00:05:09,040 passed to 101 00:05:09,040 --> 00:05:13,120 the large language model. That's how you get the highest accuracy by cleaning up 102 00:05:13,120 --> 00:05:13,840 the retrieved 103 00:05:13,840 --> 00:05:18,480 information before the LLM even sees it. That's a critical distinction. The AI isn't 104 00:05:18,480 --> 00:05:22,240 just grabbing nearby stuff. It's prioritizing the quality of the evidence. 105 00:05:22,240 --> 00:05:26,720 Exactly. And they add another layer, too. Built-in support for deep research. 106 00:05:27,360 --> 00:05:32,800 You can choose quick answer or a deeper dive. Deep research takes longer, naturally, 107 00:05:32,800 --> 00:05:36,960 but it looks at way more sources and gives back really in-depth answers with more 108 00:05:36,960 --> 00:05:38,000 context. 109 00:05:38,000 --> 00:05:40,320 Great for complex questions or high stakes decisions. 110 00:05:40,320 --> 00:05:43,600 That may be the most vital feature for building trust with the person asking the 111 00:05:43,600 --> 00:05:44,240 question. 112 00:05:44,240 --> 00:05:45,280 Citations. 113 00:05:45,280 --> 00:05:50,340 Absolutely non-negotiable, usually. The system automatically cites the exact 114 00:05:50,340 --> 00:05:51,040 sources for its 115 00:05:51,040 --> 00:05:55,510 answers. This lets you, the user, click through and see the original document, the 116 00:05:55,510 --> 00:05:56,160 page, even 117 00:05:56,160 --> 00:06:00,570 the paragraph where the AI got its info. In a business setting, that traceability 118 00:06:00,570 --> 00:06:01,120 is essential 119 00:06:01,120 --> 00:06:03,040 for compliance, for validation. 120 00:06:03,040 --> 00:06:06,560 And building on that control idea, there's metadata filtering. 121 00:06:06,560 --> 00:06:09,360 Give us a quick, practical example of why that matters. 122 00:06:09,360 --> 00:06:14,800 Sure. So this lets you limit the AI's answers to only a specific slice of your data 123 00:06:14,800 --> 00:06:15,280 based on 124 00:06:15,280 --> 00:06:19,520 tags you added when you uploaded the documents. Imagine a big company. You might 125 00:06:19,520 --> 00:06:20,160 need an agent 126 00:06:20,160 --> 00:06:26,510 that only answers using documents tagged Legal 2023 Q4 to make sure it's compliant, 127 00:06:26,510 --> 00:06:27,120 maybe excluding 128 00:06:27,120 --> 00:06:30,760 marketing stuff entirely. It keeps the agent operating within very specific 129 00:06:30,760 --> 00:06:32,000 boundaries. Again, 130 00:06:32,000 --> 00:06:35,280 making sure the answers are traceable and vetted from your chosen sources. 131 00:06:35,280 --> 00:06:39,150 Okay. That level of reliability, that optimized architecture, that usually takes a 132 00:06:39,150 --> 00:06:39,680 dedicated 133 00:06:39,680 --> 00:06:44,340 expense of engineering team. But agents had to talk about production in hours. So 134 00:06:44,340 --> 00:06:44,480 let's 135 00:06:44,480 --> 00:06:47,520 shift to developer experience and flexibility section three. 136 00:06:47,520 --> 00:06:53,200 Right. If we pivot from accuracy to just making it easy to implement, what's really 137 00:06:53,200 --> 00:06:53,680 important for 138 00:06:53,680 --> 00:06:58,830 scaling is how accessible they've made deployment. They offer ready to go SDKs for 139 00:06:58,830 --> 00:06:59,440 JavaScript and 140 00:06:59,440 --> 00:07:05,360 Python, clean APIs, typed SDKs too. This means developers can upload data and plug 141 00:07:05,360 --> 00:07:05,760 it into 142 00:07:05,760 --> 00:07:11,280 existing systems fast without wrestling with, you know, messy or undocumented code. 143 00:07:11,280 --> 00:07:15,280 Okay. But let's be a bit skeptical. If agent set is prepackaging all these fancy 144 00:07:15,920 --> 00:07:20,030 rogue techniques, what's the catch? Am I locked into their way of doing things? 145 00:07:20,030 --> 00:07:21,120 Their cloud, 146 00:07:21,120 --> 00:07:25,040 their choice of AI model. Excellent question. And the source material really 147 00:07:25,040 --> 00:07:25,840 stresses this. 148 00:07:25,840 --> 00:07:30,800 Agent set is extremely model agnostic. You are specifically not locked into one 149 00:07:30,800 --> 00:07:31,680 vendor's AI. 150 00:07:31,680 --> 00:07:36,080 That's a huge strategic plus. You keep control, you pick your own vector database, 151 00:07:36,080 --> 00:07:39,280 your own embedding model, and critically your own large language model. 152 00:07:39,280 --> 00:07:43,200 And that's not just a tech detail, is it? That's about cost, it's about strategy. 153 00:07:43,200 --> 00:07:46,080 Absolutely. It lets you fine tune performance and manage your budget. 154 00:07:46,080 --> 00:07:50,240 Maybe you use a powerful pricey model like GPT-4 for that deep research feature. 155 00:07:50,240 --> 00:07:55,530 Or maybe a cheaper, faster model from Anthropic or Cohere for everyday customer 156 00:07:55,530 --> 00:07:56,320 questions. 157 00:07:56,320 --> 00:08:00,400 Agent set works with all the big players. Open AI, Anthropic, Google AI, 158 00:08:00,400 --> 00:08:04,960 Cohere. You keep control of your underlying tech stack, avoid getting locked in. 159 00:08:04,960 --> 00:08:08,000 That makes a lot of sense. Now once I've got agent set running, 160 00:08:08,000 --> 00:08:11,040 how do I actually connect that knowledge base to other applications? 161 00:08:11,600 --> 00:08:14,320 So for bringing that knowledge out, they offer a couple of ways. 162 00:08:14,320 --> 00:08:19,760 One is just easy integration using their standard AI SDK. Pretty straightforward. 163 00:08:19,760 --> 00:08:24,160 But for connecting that powerful, grounded knowledge base to external apps or maybe 164 00:08:24,160 --> 00:08:28,360 microservices, they have something called the Model Context Protocol Server or MCP 165 00:08:28,360 --> 00:08:28,960 server. 166 00:08:28,960 --> 00:08:31,600 Okay. What does the MCP server do exactly? 167 00:08:31,600 --> 00:08:34,560 Think of it like a secure gateway, a dedicated one. 168 00:08:34,560 --> 00:08:38,000 It lets your other applications query the agent set knowledge base, 169 00:08:38,000 --> 00:08:42,000 that whole sophisticated RRAG engine, without needing to rebuild the retrieval and 170 00:08:42,000 --> 00:08:42,880 LLM logic 171 00:08:42,880 --> 00:08:47,200 themselves. It essentially serves up the contextual proof, ready for any external 172 00:08:47,200 --> 00:08:51,280 app to use to generate a reliable answer. And to help developers get going faster, 173 00:08:51,280 --> 00:08:56,000 they even threw in a chat playground. Yes. With message editing and citations built 174 00:08:56,000 --> 00:09:00,240 right in, it's brilliant for just quickly trying things out, prototyping. 175 00:09:00,240 --> 00:09:04,100 Developers can immediately see if the RIG is working well, check the accuracy of 176 00:09:04,100 --> 00:09:04,560 answers, 177 00:09:04,560 --> 00:09:08,720 without having to push anything live, it cuts down testing time dramatically. 178 00:09:08,720 --> 00:09:14,620 Okay, let's move to section four. This is huge. Security and control. Especially 179 00:09:14,620 --> 00:09:14,960 when we're talking 180 00:09:14,960 --> 00:09:19,860 about grounding AI in sensitive proprietary company data. What protections are 181 00:09:19,860 --> 00:09:20,320 baked in? 182 00:09:20,320 --> 00:09:23,280 How do they build that trust? Security seems to be multilayered, 183 00:09:23,280 --> 00:09:27,440 often aiming higher than standard practice. Your data is secured with end-to-end 184 00:09:27,440 --> 00:09:28,320 encryption using 185 00:09:28,320 --> 00:09:34,140 bank-grade AES-256. And all the data moving around is secured using TLS. Standard, 186 00:09:34,140 --> 00:09:35,360 but essential. 187 00:09:35,360 --> 00:09:38,960 But the real decider for many companies is control. Who actually owns the data? 188 00:09:38,960 --> 00:09:41,440 Who controls the infrastructure? Where does it live? 189 00:09:41,440 --> 00:09:45,600 Absolutely. And the key differentiator here seems to be flexibility and hosting 190 00:09:45,600 --> 00:09:46,400 control. 191 00:09:46,400 --> 00:09:50,080 The platform lets users host your data on top of your own vector database, 192 00:09:50,080 --> 00:09:54,160 your own storage bucket, and your own chosen AI models. For sensitive systems, 193 00:09:54,160 --> 00:09:57,200 keeping ownership and control over that data stack is paramount. 194 00:09:57,200 --> 00:10:00,240 And what about organizations with really strict rules, 195 00:10:00,240 --> 00:10:04,560 especially around where data can physically be? Data residency requirements? 196 00:10:04,560 --> 00:10:10,000 Yep. For compliance needs, Agentset offers specific options for EU data residency. 197 00:10:10,000 --> 00:10:13,840 That means ensuring data is processed only on servers within the EU, 198 00:10:13,840 --> 00:10:17,520 which ticks a big box for GDPR and similar regulations. 199 00:10:17,520 --> 00:10:21,120 And for the absolute maximum control, maybe for finance or healthcare, 200 00:10:21,120 --> 00:10:23,440 they support on-premise deployment. 201 00:10:23,440 --> 00:10:28,000 So on-prem means putting the whole Agentset system behind your own company firewall, 202 00:10:28,000 --> 00:10:28,960 in your own cloud. 203 00:10:28,960 --> 00:10:33,040 That's exactly it. It lets you deploy Agentset inside your existing cloud 204 00:10:33,040 --> 00:10:33,600 environment, 205 00:10:33,600 --> 00:10:39,520 AWS, Azure, GCP, whatever, but completely under your security rules behind your 206 00:10:39,520 --> 00:10:40,320 firewalls. 207 00:10:40,320 --> 00:10:43,920 That's the ultimate level of control for really critical applications. 208 00:10:43,920 --> 00:10:45,680 So it sounds like if you're looking to adopt this, 209 00:10:45,680 --> 00:10:48,720 there are two pretty clear paths depending on your needs. 210 00:10:48,720 --> 00:10:52,240 Yeah, basically. First, there's Agentset Cloud. That's the quickest way to get 211 00:10:52,240 --> 00:10:52,560 started, 212 00:10:52,560 --> 00:10:57,440 kick the tires. Crucially, they have a generous free tier, 1,000 pages of documents 213 00:10:57,440 --> 00:10:58,400 you can ingest, 214 00:10:58,400 --> 00:11:02,560 10,000 retrievals. Makes it really easy to experiment without commitment. 215 00:11:02,560 --> 00:11:06,400 And then, for those who need that total control, you can self-host. 216 00:11:06,400 --> 00:11:11,600 Right. Because Agentset is open source MIT license, which is very permissive. You 217 00:11:11,600 --> 00:11:11,680 can 218 00:11:11,680 --> 00:11:15,520 just download the whole thing and run it yourself. That gives you absolute maximum 219 00:11:15,520 --> 00:11:16,880 technical control 220 00:11:16,880 --> 00:11:21,440 over the entire RAG setup, the knowledge, the infrastructure. It's really a choice 221 00:11:21,440 --> 00:11:22,240 between 222 00:11:22,240 --> 00:11:25,760 getting to market fast versus having complete uncompromised control. 223 00:11:25,760 --> 00:11:31,280 So, to kind of wrap up the key takeaway for you, the listener, Agentset takes the 224 00:11:31,280 --> 00:11:32,160 really painful 225 00:11:32,160 --> 00:11:36,480 complex bits of RAG getting diverse data in, figuring out chunking, doing advanced 226 00:11:36,480 --> 00:11:36,880 retrieval 227 00:11:36,880 --> 00:11:41,120 like hybrid search, adding citations and packages at all into an accessible engine 228 00:11:41,120 --> 00:11:41,760 that's accurate 229 00:11:41,760 --> 00:11:45,470 out of the box. It essentially lets you skip building that whole complex 230 00:11:45,470 --> 00:11:46,560 infrastructure from 231 00:11:46,560 --> 00:11:50,240 scratch. You bypass the RAG engineering headaches and can focus straight away on 232 00:11:50,240 --> 00:11:51,120 building a reliable 233 00:11:51,120 --> 00:11:54,720 AI app you can actually trust. And here's where it gets really interesting for me, 234 00:11:54,720 --> 00:11:59,760 the final thought. Given how easy they're making deployment, the model flexibility, 235 00:11:59,760 --> 00:12:03,780 these powerful accuracy features like hybrid search and deep research, how quickly 236 00:12:03,780 --> 00:12:04,240 will these 237 00:12:04,240 --> 00:12:08,800 traceable knowledge-backed AI agents just become the standard? Will they totally 238 00:12:08,800 --> 00:12:10,240 replace the simpler, 239 00:12:10,240 --> 00:12:13,760 less reliable chat bots that, you know, can't actually prove where they got their 240 00:12:13,760 --> 00:12:14,720 answers from 241 00:12:14,720 --> 00:12:19,120 using your specific documents? It feels like the demand for trust is making traceability 242 00:12:19,120 --> 00:12:23,520 basically mandatory. A huge thank you once again to SafeServer for supporting this 243 00:12:23,520 --> 00:12:24,320 deep dive and 244 00:12:24,320 --> 00:12:28,070 enabling digital transformation. If you want to explore hosting solutions for this 245 00:12:28,070 --> 00:12:28,480 kind of 246 00:12:28,480 --> 00:12:34,800 advanced software, please do visit www.safeserver.de. And thank you for joining us 247 00:12:34,800 --> 00:12:35,600 for this deep dive. 248 00:12:35,600 --> 00:12:37,840 Go forth and ground your agents.