1 00:00:00,000 --> 00:00:05,200 Welcome to the Deep Dive. Today, we are jumping right into a really core problem 2 00:00:05,200 --> 00:00:06,160 for modern AI 3 00:00:06,160 --> 00:00:10,800 applications. It's all about getting AI to actually understand the documents you 4 00:00:10,800 --> 00:00:11,440 give it, 5 00:00:11,440 --> 00:00:16,400 especially the tricky ones, you know, visual stuff, complex PDFs, multimodal things, 6 00:00:16,400 --> 00:00:20,160 the technical specs, manuals, diagrams, maybe even training videos. 7 00:00:20,160 --> 00:00:23,150 Exactly. If you're trying to build something reliable, something that gives 8 00:00:23,150 --> 00:00:23,920 accurate answers, 9 00:00:23,920 --> 00:00:29,200 well, you know, the moment you throw in a chart or like a complex PDF, accuracy 10 00:00:29,200 --> 00:00:29,840 just tanks. 11 00:00:30,000 --> 00:00:33,600 We're looking at some sources today that really diagnose this, and they introduce a 12 00:00:33,600 --> 00:00:34,320 toolset called 13 00:00:34,320 --> 00:00:39,310 Morphic. They claim it provides the most accurate foundation for these kinds of 14 00:00:39,310 --> 00:00:40,400 document-based AI 15 00:00:40,400 --> 00:00:44,800 apps. Right. So our mission here, for you listening, is to get a clear picture of 16 00:00:44,800 --> 00:00:45,600 why the standard 17 00:00:45,600 --> 00:00:50,190 approach of retrieval, augmented generation, or RAG, why it often fails when things 18 00:00:50,190 --> 00:00:50,880 get serious, 19 00:00:50,880 --> 00:00:55,200 when you try to scale it up, and then how this newer AI native toolset, Morphic, 20 00:00:55,200 --> 00:00:55,840 how it aims to 21 00:00:55,840 --> 00:00:59,680 fix that, the scaling, the cost, and especially the accuracy, making it easier for 22 00:00:59,680 --> 00:01:00,640 beginners too. 23 00:01:00,640 --> 00:01:04,180 Definitely. We'll get into those fragile pipelines in a moment. But first, just a 24 00:01:04,180 --> 00:01:04,800 quick word from our 25 00:01:04,800 --> 00:01:08,890 supporter who helps keep these deep dives going. Safe Server ensures the robust 26 00:01:08,890 --> 00:01:09,680 hosting of cutting 27 00:01:09,680 --> 00:01:13,870 edge software, like the tools we're discussing. They support you in your digital 28 00:01:13,870 --> 00:01:14,960 transformation. 29 00:01:14,960 --> 00:01:20,610 You can find more info at www.safeserver.de. Okay, so let's start with that 30 00:01:20,610 --> 00:01:22,000 baseline. Retrieval, 31 00:01:22,000 --> 00:01:28,560 augmented generation, R. It's pretty much the standard way for grounding large 32 00:01:28,560 --> 00:01:29,440 language models, 33 00:01:29,440 --> 00:01:34,560 the big AI brains in real world data, like your company Docs, so they don't just 34 00:01:34,560 --> 00:01:36,000 make stuff up. 35 00:01:36,000 --> 00:01:40,520 Yeah. Arga is great for a proof of concept, a quick demo. But the sources we have 36 00:01:40,520 --> 00:01:41,040 are really 37 00:01:41,040 --> 00:01:46,000 clear on this, those POCs. They often fail spectacularly in production. And the 38 00:01:46,000 --> 00:01:46,240 reason 39 00:01:46,240 --> 00:01:49,330 is actually pretty straightforward. It feels like your whole system is held 40 00:01:49,330 --> 00:01:50,400 together with 41 00:01:50,400 --> 00:01:54,880 digital duct tape. Duct taping. I saw that mentioned. Like a dozen different tools 42 00:01:54,880 --> 00:01:55,200 cobbled 43 00:01:55,200 --> 00:01:59,280 together. You've got text extraction over here, OCR doing its thing there, embedding 44 00:01:59,280 --> 00:01:59,680 models, 45 00:01:59,680 --> 00:02:03,120 vector databases. Each one is a potential breaking point, right? Creating these 46 00:02:03,120 --> 00:02:04,160 really fragile 47 00:02:04,160 --> 00:02:08,320 pipelines that just will break under actual real world pressure. Absolutely. And 48 00:02:08,320 --> 00:02:09,360 that fragility. 49 00:02:09,360 --> 00:02:13,920 It hurts most when you hit those visually rich documents. The fundamental issue 50 00:02:13,920 --> 00:02:14,000 with 51 00:02:14,000 --> 00:02:17,970 these traditional pipelines is they basically treat everything as if it's just 52 00:02:17,970 --> 00:02:18,560 plain text, 53 00:02:18,560 --> 00:02:23,200 even when it's obviously not. So, okay, if the pipeline just strips out all the 54 00:02:23,200 --> 00:02:24,160 visual context, 55 00:02:24,160 --> 00:02:30,290 what happens to something like, say, a wiring diagram? Does it just become a random 56 00:02:30,290 --> 00:02:30,560 list of 57 00:02:30,560 --> 00:02:35,400 labels? Pretty much, yes. The crucial visual information, gone. That detailed 58 00:02:35,400 --> 00:02:36,320 diagram loses 59 00:02:36,320 --> 00:02:39,920 its vital spatial relationships. You know, the fact that component A is connected 60 00:02:39,920 --> 00:02:41,040 to component B, 61 00:02:41,040 --> 00:02:45,200 that meaning is lost. Or a critical bar chart, maybe showing performance dropping 62 00:02:45,200 --> 00:02:45,680 off. 63 00:02:45,680 --> 00:02:50,560 It just becomes meaningless text fragments to the AI. Tables, oh, tables get mangled 64 00:02:50,560 --> 00:02:50,800 into 65 00:02:50,800 --> 00:02:54,250 unreadable strings. The system totally misses the headers, the columns, the 66 00:02:54,250 --> 00:02:54,880 structure. 67 00:02:54,880 --> 00:02:59,040 Wow. And the result then is pretty bad because the AI app might be confidently 68 00:02:59,040 --> 00:02:59,760 returning wrong 69 00:02:59,760 --> 00:03:04,090 answers. Yeah. It thinks it knows because it's awesome text, but it missed the 70 00:03:04,090 --> 00:03:04,800 crucial bit in 71 00:03:04,800 --> 00:03:08,560 the image or misunderstood the layout. That sounds like a huge business risk. 72 00:03:08,560 --> 00:03:14,160 It is. And don't forget the cost side. Think about an application trying to answer 73 00:03:14,160 --> 00:03:14,480 questions 74 00:03:14,480 --> 00:03:19,600 about some massive 500-page equipment manual. The old RAGRA forces the LLM to 75 00:03:19,600 --> 00:03:20,160 process and 76 00:03:20,160 --> 00:03:24,240 reprocess that huge document over and over for almost every single question. 77 00:03:24,240 --> 00:03:28,720 That gets incredibly slow and really, really expensive when you scale it up. 78 00:03:28,720 --> 00:03:34,090 Okay. So if that's the reality of traditional RAG, inaccurate, fragile, expensive, 79 00:03:34,090 --> 00:03:34,800 RAGRAs, 80 00:03:34,800 --> 00:03:39,720 how do we actually build systems that can see the charts properly? This is where 81 00:03:39,720 --> 00:03:40,640 Morphic comes in, 82 00:03:40,640 --> 00:03:45,070 described as an AI-native toolset. The sources say it provides the most accurate 83 00:03:45,070 --> 00:03:45,840 document search and 84 00:03:45,840 --> 00:03:50,160 store for building AI apps. Right. This is the big shift. It's designed end-to-end 85 00:03:50,160 --> 00:03:50,800 specifically 86 00:03:50,800 --> 00:03:55,530 to store, represent, and search unstructured data. It treats those complex things, 87 00:03:55,530 --> 00:03:56,480 PDFs, videos, 88 00:03:56,480 --> 00:04:00,400 diagrams, as, well, first-class citizens right from the start. It doesn't try to cram 89 00:04:00,400 --> 00:04:00,720 visual 90 00:04:00,720 --> 00:04:04,480 data into a text-only box. Let's get into the features then. How does it actually 91 00:04:04,480 --> 00:04:04,960 achieve that 92 00:04:04,960 --> 00:04:09,890 accuracy? Multi-modal data handling sounds like step one. It offers first-class 93 00:04:09,890 --> 00:04:10,800 support for 94 00:04:10,800 --> 00:04:16,330 unstructured data. That seems key because, yeah, most existing systems kind of 95 00:04:16,330 --> 00:04:17,120 choke when you give 96 00:04:17,120 --> 00:04:21,040 them a video or a really complex PDF. And the search itself is smarter. It uses 97 00:04:21,040 --> 00:04:21,840 specialized 98 00:04:21,840 --> 00:04:25,410 techniques. The sources mentioned something called cold poly to build search that 99 00:04:25,410 --> 00:04:25,840 actually 100 00:04:25,840 --> 00:04:30,260 understands the visual content. So you can search across images, PDFs, videos, all 101 00:04:30,260 --> 00:04:31,360 sorts of things 102 00:04:31,360 --> 00:04:36,160 using just one single endpoint because the system gets the meaning of the visuals, 103 00:04:36,160 --> 00:04:40,800 not just the text nearby. Okay. And what about that cost and scaling nightmare you 104 00:04:40,800 --> 00:04:41,440 mentioned, 105 00:04:41,440 --> 00:04:45,060 the constant reprocessing of giant manuals? They have something called cache 106 00:04:45,060 --> 00:04:46,000 augmented generation. 107 00:04:46,000 --> 00:04:49,790 Sounds technical. Can you break down what that actually means for like my server 108 00:04:49,790 --> 00:04:50,720 costs? Yeah, 109 00:04:50,720 --> 00:04:55,610 absolutely. It lets you create what they call persistent KV caches of your 110 00:04:55,610 --> 00:04:57,200 documents. Think of 111 00:04:57,200 --> 00:05:03,590 it like this. The LLM reads that whole 500 page manual properly just once. Morphic 112 00:05:03,590 --> 00:05:04,640 then takes a 113 00:05:04,640 --> 00:05:09,600 perfect index snapshot of the LLM's understanding of that document. It essentially 114 00:05:09,600 --> 00:05:10,480 freezes that 115 00:05:10,480 --> 00:05:15,710 understanding and saves it like a super smart sticky note. So the AI doesn't have 116 00:05:15,710 --> 00:05:16,400 to reread 117 00:05:16,400 --> 00:05:21,360 the entire thing from scratch every single time someone asks a related question. Ah, 118 00:05:21,360 --> 00:05:25,710 so you're pre-processing the intelligence, not just the raw text. That sounds like 119 00:05:25,710 --> 00:05:25,920 it would 120 00:05:25,920 --> 00:05:30,790 massively speed things up and cut down compute costs drastically, avoiding all that 121 00:05:30,790 --> 00:05:31,360 repetitive 122 00:05:31,360 --> 00:05:35,040 heavy lifting. That's a big deal for a production system. Huge deal. And for 123 00:05:35,040 --> 00:05:35,920 developers who need 124 00:05:35,920 --> 00:05:40,370 more control, Morphic helps bring structure back to this unstructured mess. 125 00:05:40,370 --> 00:05:41,840 Remember we talked about 126 00:05:41,840 --> 00:05:45,360 diagrams losing their spatial meaning? Knowledge graphs are the answer there. Right. 127 00:05:45,360 --> 00:05:45,840 So letting 128 00:05:45,840 --> 00:05:48,780 users build the main specific knowledge graphs and doing it with just a single line 129 00:05:48,780 --> 00:05:49,120 of code, 130 00:05:49,120 --> 00:05:53,180 that means you're putting the logic back in, turning those mangled strings back 131 00:05:53,180 --> 00:05:53,360 into a 132 00:05:53,360 --> 00:05:57,810 connected map of how things relate. Precisely. The AI can then follow those 133 00:05:57,810 --> 00:05:59,200 connections logically, 134 00:05:59,200 --> 00:06:03,120 much more reliable. And alongside that, there's the natural language rules engine. 135 00:06:03,120 --> 00:06:08,240 Think of it like defining rules for your unstructured data, but using plain English, 136 00:06:08,240 --> 00:06:12,810 not complicated code. You can just tell it how data should be ingested, categorized, 137 00:06:12,810 --> 00:06:13,440 queried. 138 00:06:13,440 --> 00:06:18,250 It's like using common sense to structure chaos. And it also handles metadata 139 00:06:18,250 --> 00:06:19,200 extraction, 140 00:06:19,200 --> 00:06:24,000 pulling out specifics like bounding boxes from images or classifying parts of a 141 00:06:24,000 --> 00:06:24,640 document 142 00:06:24,640 --> 00:06:29,200 quickly and scalably. Yes. Extracting that crucial low-level detail without adding 143 00:06:29,200 --> 00:06:29,440 more 144 00:06:29,440 --> 00:06:35,040 fragility to the pipeline. Okay. It sounds really powerful, but maybe complex. If 145 00:06:35,040 --> 00:06:35,760 Morphic is doing 146 00:06:35,760 --> 00:06:39,600 all this heavy lifting visual analysis, knowledge graphs, caching, does that just 147 00:06:39,600 --> 00:06:40,400 shift the cost 148 00:06:40,400 --> 00:06:45,510 from compute to developer complexity, especially for a beginner? That's a really 149 00:06:45,510 --> 00:06:46,240 fair point. But 150 00:06:46,240 --> 00:06:50,430 the aim here is unification. Instead of you juggling, say, 12 different tools, you're 151 00:06:50,430 --> 00:06:50,880 managing 152 00:06:50,880 --> 00:06:55,050 one integrated system. And they've tried to make getting started pretty easy. If 153 00:06:55,050 --> 00:06:55,840 you're a beginner 154 00:06:55,840 --> 00:07:00,080 and you just want your AI app to stop being confidently wrong, the hosted option is 155 00:07:00,080 --> 00:07:00,320 probably 156 00:07:00,320 --> 00:07:04,550 the simplest path. Okay. Tell us more. How does someone actually get started with 157 00:07:04,550 --> 00:07:05,440 this? Easiest 158 00:07:05,440 --> 00:07:09,810 way. Sign up for the free tier directly on Morphic's site. They say it's a generous 159 00:07:09,810 --> 00:07:10,480 free tier, 160 00:07:10,480 --> 00:07:14,750 enough to actually build and test things properly before you hit any paywalls. 161 00:07:14,750 --> 00:07:15,520 After that, the 162 00:07:15,520 --> 00:07:20,560 pricing is transparent based on your actual compute usage. No complex licenses. 163 00:07:20,560 --> 00:07:21,120 Just pay 164 00:07:21,120 --> 00:07:25,160 for what you use. And for developers, people who want to code against it. There's a 165 00:07:25,160 --> 00:07:26,320 Python SDK 166 00:07:26,320 --> 00:07:31,200 and also a REST API. The examples in the sources look pretty simple, like ingesting 167 00:07:31,200 --> 00:07:31,840 a file, 168 00:07:31,840 --> 00:07:36,160 basically one line of code, and asking a question is straightforward, too. You can 169 00:07:36,160 --> 00:07:36,800 ask something 170 00:07:36,800 --> 00:07:41,680 specific like, what's the height of screw 14A in the chair assembly instructions? 171 00:07:41,680 --> 00:07:42,160 And the system 172 00:07:42,160 --> 00:07:46,110 does the hard work of finding the diagram, reading it, and pulling out that exact 173 00:07:46,110 --> 00:07:47,040 measurement. 174 00:07:47,040 --> 00:07:51,360 What about someone who isn't a coder, maybe yet, but still needs this kind of 175 00:07:51,360 --> 00:07:52,480 accurate understanding 176 00:07:52,480 --> 00:07:55,770 from their documents? There's the Morfit console. It's a web interface. You can 177 00:07:55,770 --> 00:07:56,720 just upload files 178 00:07:56,720 --> 00:08:00,770 there, connect to other data sources, and basically chat with your data all within 179 00:08:00,770 --> 00:08:01,520 the same place. 180 00:08:01,520 --> 00:08:05,760 So you get the power of the backend without needing to write code right away. 181 00:08:05,760 --> 00:08:10,400 Good to have options. Now, for folks thinking about self-hosting or larger deployments, 182 00:08:11,360 --> 00:08:15,080 we should touch on licensing. That's always critical. Right. The core product is 183 00:08:15,080 --> 00:08:15,280 source 184 00:08:15,280 --> 00:08:19,760 available. It uses the business source license 1.1. What that means is it's 185 00:08:19,760 --> 00:08:20,800 completely free 186 00:08:20,800 --> 00:08:25,120 for personal use or for indie developers. If you're building a commercial product 187 00:08:25,120 --> 00:08:25,360 with it, 188 00:08:25,360 --> 00:08:29,340 it's still free as long as that deployment makes less than $2,000 per month in 189 00:08:29,340 --> 00:08:30,240 gross revenue. 190 00:08:30,240 --> 00:08:34,560 Okay. Good to know. And there was one crucial note about updates, 191 00:08:34,560 --> 00:08:38,480 something about a migration script. Ah, yes. Very important detail. 192 00:08:38,480 --> 00:08:43,840 If you happen to install Morphic before June 22nd, 2025, you absolutely need to run 193 00:08:43,840 --> 00:08:44,480 a specific 194 00:08:44,480 --> 00:08:48,100 migration script they provide. It's not just a routine update, it optimizes the 195 00:08:48,100 --> 00:08:48,880 authentication 196 00:08:48,880 --> 00:08:53,200 system. They're claiming a 70, 80% faster query performance after running it. 197 00:08:53,200 --> 00:08:56,800 Wow. 70, 80%. Okay. Yeah. You definitely want that speed boost. 198 00:08:56,800 --> 00:09:01,060 Definitely capture that. So yeah, overall, Morphic seems to tackle that core RREG 199 00:09:01,060 --> 00:09:01,920 problem head on. 200 00:09:01,920 --> 00:09:06,480 It unifies the tools, treats visual and unstructured data properly from the start, 201 00:09:06,480 --> 00:09:10,960 and delivers better accuracy and scaling using smart caching within a single system. 202 00:09:10,960 --> 00:09:15,360 So to recap for everyone listening, we've seen why just duct taping standard tools 203 00:09:15,360 --> 00:09:16,320 together leads to 204 00:09:16,320 --> 00:09:21,830 inaccurate, fragile, and expensive AI pipelines, especially with visual data, and 205 00:09:21,830 --> 00:09:22,880 how Morphic uses 206 00:09:22,880 --> 00:09:27,890 this AI native database approach to properly ingest, actually understand, and 207 00:09:27,890 --> 00:09:29,040 reliably retrieve 208 00:09:29,040 --> 00:09:33,920 info from complex multimodal documents. And before we wrap up, let's give one 209 00:09:33,920 --> 00:09:39,830 final thank you to SafeServer. Again, that's www.safeserver.de. They provide the 210 00:09:39,830 --> 00:09:40,160 kind of 211 00:09:40,160 --> 00:09:44,250 robust infrastructure that makes hosting advanced software like this possible, 212 00:09:44,250 --> 00:09:45,040 really supporting 213 00:09:45,040 --> 00:09:48,230 digital transformation. Okay, so this whole conversation leaves me with a final 214 00:09:48,230 --> 00:09:48,560 thought, 215 00:09:48,560 --> 00:09:52,480 something for you to mull over. If developers don't have to fight with stitching 216 00:09:52,480 --> 00:09:53,280 together a dozen 217 00:09:53,280 --> 00:09:58,110 tools anymore for complex documents, and AI can finally truly understand charts and 218 00:09:58,110 --> 00:09:58,640 diagrams 219 00:09:58,640 --> 00:10:02,560 accurately, what's the next really complex multimodal data source that's going to 220 00:10:02,560 --> 00:10:02,880 become 221 00:10:02,880 --> 00:10:08,410 critical for AI to master? Is it maybe like highly detailed satellite imagery or 222 00:10:08,410 --> 00:10:08,800 analyzing 223 00:10:08,800 --> 00:10:13,610 real-time video feeds from a busy factory floor? Hmm, something to think about. 224 00:10:13,610 --> 00:10:14,000 Definitely food 225 00:10:14,000 --> 00:10:15,840 for thought. Until next time, keep digging deep.