1
00:00:00,000 --> 00:00:05,200
Welcome to the Deep Dive. Today, we are jumping right into a really core problem

2
00:00:05,200 --> 00:00:06,160
for modern AI

3
00:00:06,160 --> 00:00:10,800
applications. It's all about getting AI to actually understand the documents you

4
00:00:10,800 --> 00:00:11,440
give it,

5
00:00:11,440 --> 00:00:16,400
especially the tricky ones, you know, visual stuff, complex PDFs, multimodal things,

6
00:00:16,400 --> 00:00:20,160
the technical specs, manuals, diagrams, maybe even training videos.

7
00:00:20,160 --> 00:00:23,150
Exactly. If you're trying to build something reliable, something that gives

8
00:00:23,150 --> 00:00:23,920
accurate answers,

9
00:00:23,920 --> 00:00:29,200
well, you know, the moment you throw in a chart or like a complex PDF, accuracy

10
00:00:29,200 --> 00:00:29,840
just tanks.

11
00:00:30,000 --> 00:00:33,600
We're looking at some sources today that really diagnose this, and they introduce a

12
00:00:33,600 --> 00:00:34,320
toolset called

13
00:00:34,320 --> 00:00:39,310
Morphic. They claim it provides the most accurate foundation for these kinds of

14
00:00:39,310 --> 00:00:40,400
document-based AI

15
00:00:40,400 --> 00:00:44,800
apps. Right. So our mission here, for you listening, is to get a clear picture of

16
00:00:44,800 --> 00:00:45,600
why the standard

17
00:00:45,600 --> 00:00:50,190
approach of retrieval, augmented generation, or RAG, why it often fails when things

18
00:00:50,190 --> 00:00:50,880
get serious,

19
00:00:50,880 --> 00:00:55,200
when you try to scale it up, and then how this newer AI native toolset, Morphic,

20
00:00:55,200 --> 00:00:55,840
how it aims to

21
00:00:55,840 --> 00:00:59,680
fix that, the scaling, the cost, and especially the accuracy, making it easier for

22
00:00:59,680 --> 00:01:00,640
beginners too.

23
00:01:00,640 --> 00:01:04,180
Definitely. We'll get into those fragile pipelines in a moment. But first, just a

24
00:01:04,180 --> 00:01:04,800
quick word from our

25
00:01:04,800 --> 00:01:08,890
supporter who helps keep these deep dives going. Safe Server ensures the robust

26
00:01:08,890 --> 00:01:09,680
hosting of cutting

27
00:01:09,680 --> 00:01:13,870
edge software, like the tools we're discussing. They support you in your digital

28
00:01:13,870 --> 00:01:14,960
transformation.

29
00:01:14,960 --> 00:01:20,610
You can find more info at www.safeserver.de. Okay, so let's start with that

30
00:01:20,610 --> 00:01:22,000
baseline. Retrieval,

31
00:01:22,000 --> 00:01:28,560
augmented generation, R. It's pretty much the standard way for grounding large

32
00:01:28,560 --> 00:01:29,440
language models,

33
00:01:29,440 --> 00:01:34,560
the big AI brains in real world data, like your company Docs, so they don't just

34
00:01:34,560 --> 00:01:36,000
make stuff up.

35
00:01:36,000 --> 00:01:40,520
Yeah. Arga is great for a proof of concept, a quick demo. But the sources we have

36
00:01:40,520 --> 00:01:41,040
are really

37
00:01:41,040 --> 00:01:46,000
clear on this, those POCs. They often fail spectacularly in production. And the

38
00:01:46,000 --> 00:01:46,240
reason

39
00:01:46,240 --> 00:01:49,330
is actually pretty straightforward. It feels like your whole system is held

40
00:01:49,330 --> 00:01:50,400
together with

41
00:01:50,400 --> 00:01:54,880
digital duct tape. Duct taping. I saw that mentioned. Like a dozen different tools

42
00:01:54,880 --> 00:01:55,200
cobbled

43
00:01:55,200 --> 00:01:59,280
together. You've got text extraction over here, OCR doing its thing there, embedding

44
00:01:59,280 --> 00:01:59,680
models,

45
00:01:59,680 --> 00:02:03,120
vector databases. Each one is a potential breaking point, right? Creating these

46
00:02:03,120 --> 00:02:04,160
really fragile

47
00:02:04,160 --> 00:02:08,320
pipelines that just will break under actual real world pressure. Absolutely. And

48
00:02:08,320 --> 00:02:09,360
that fragility.

49
00:02:09,360 --> 00:02:13,920
It hurts most when you hit those visually rich documents. The fundamental issue

50
00:02:13,920 --> 00:02:14,000
with

51
00:02:14,000 --> 00:02:17,970
these traditional pipelines is they basically treat everything as if it's just

52
00:02:17,970 --> 00:02:18,560
plain text,

53
00:02:18,560 --> 00:02:23,200
even when it's obviously not. So, okay, if the pipeline just strips out all the

54
00:02:23,200 --> 00:02:24,160
visual context,

55
00:02:24,160 --> 00:02:30,290
what happens to something like, say, a wiring diagram? Does it just become a random

56
00:02:30,290 --> 00:02:30,560
list of

57
00:02:30,560 --> 00:02:35,400
labels? Pretty much, yes. The crucial visual information, gone. That detailed

58
00:02:35,400 --> 00:02:36,320
diagram loses

59
00:02:36,320 --> 00:02:39,920
its vital spatial relationships. You know, the fact that component A is connected

60
00:02:39,920 --> 00:02:41,040
to component B,

61
00:02:41,040 --> 00:02:45,200
that meaning is lost. Or a critical bar chart, maybe showing performance dropping

62
00:02:45,200 --> 00:02:45,680
off.

63
00:02:45,680 --> 00:02:50,560
It just becomes meaningless text fragments to the AI. Tables, oh, tables get mangled

64
00:02:50,560 --> 00:02:50,800
into

65
00:02:50,800 --> 00:02:54,250
unreadable strings. The system totally misses the headers, the columns, the

66
00:02:54,250 --> 00:02:54,880
structure.

67
00:02:54,880 --> 00:02:59,040
Wow. And the result then is pretty bad because the AI app might be confidently

68
00:02:59,040 --> 00:02:59,760
returning wrong

69
00:02:59,760 --> 00:03:04,090
answers. Yeah. It thinks it knows because it's awesome text, but it missed the

70
00:03:04,090 --> 00:03:04,800
crucial bit in

71
00:03:04,800 --> 00:03:08,560
the image or misunderstood the layout. That sounds like a huge business risk.

72
00:03:08,560 --> 00:03:14,160
It is. And don't forget the cost side. Think about an application trying to answer

73
00:03:14,160 --> 00:03:14,480
questions

74
00:03:14,480 --> 00:03:19,600
about some massive 500-page equipment manual. The old RAGRA forces the LLM to

75
00:03:19,600 --> 00:03:20,160
process and

76
00:03:20,160 --> 00:03:24,240
reprocess that huge document over and over for almost every single question.

77
00:03:24,240 --> 00:03:28,720
That gets incredibly slow and really, really expensive when you scale it up.

78
00:03:28,720 --> 00:03:34,090
Okay. So if that's the reality of traditional RAG, inaccurate, fragile, expensive,

79
00:03:34,090 --> 00:03:34,800
RAGRAs,

80
00:03:34,800 --> 00:03:39,720
how do we actually build systems that can see the charts properly? This is where

81
00:03:39,720 --> 00:03:40,640
Morphic comes in,

82
00:03:40,640 --> 00:03:45,070
described as an AI-native toolset. The sources say it provides the most accurate

83
00:03:45,070 --> 00:03:45,840
document search and

84
00:03:45,840 --> 00:03:50,160
store for building AI apps. Right. This is the big shift. It's designed end-to-end

85
00:03:50,160 --> 00:03:50,800
specifically

86
00:03:50,800 --> 00:03:55,530
to store, represent, and search unstructured data. It treats those complex things,

87
00:03:55,530 --> 00:03:56,480
PDFs, videos,

88
00:03:56,480 --> 00:04:00,400
diagrams, as, well, first-class citizens right from the start. It doesn't try to cram

89
00:04:00,400 --> 00:04:00,720
visual

90
00:04:00,720 --> 00:04:04,480
data into a text-only box. Let's get into the features then. How does it actually

91
00:04:04,480 --> 00:04:04,960
achieve that

92
00:04:04,960 --> 00:04:09,890
accuracy? Multi-modal data handling sounds like step one. It offers first-class

93
00:04:09,890 --> 00:04:10,800
support for

94
00:04:10,800 --> 00:04:16,330
unstructured data. That seems key because, yeah, most existing systems kind of

95
00:04:16,330 --> 00:04:17,120
choke when you give

96
00:04:17,120 --> 00:04:21,040
them a video or a really complex PDF. And the search itself is smarter. It uses

97
00:04:21,040 --> 00:04:21,840
specialized

98
00:04:21,840 --> 00:04:25,410
techniques. The sources mentioned something called cold poly to build search that

99
00:04:25,410 --> 00:04:25,840
actually

100
00:04:25,840 --> 00:04:30,260
understands the visual content. So you can search across images, PDFs, videos, all

101
00:04:30,260 --> 00:04:31,360
sorts of things

102
00:04:31,360 --> 00:04:36,160
using just one single endpoint because the system gets the meaning of the visuals,

103
00:04:36,160 --> 00:04:40,800
not just the text nearby. Okay. And what about that cost and scaling nightmare you

104
00:04:40,800 --> 00:04:41,440
mentioned,

105
00:04:41,440 --> 00:04:45,060
the constant reprocessing of giant manuals? They have something called cache

106
00:04:45,060 --> 00:04:46,000
augmented generation.

107
00:04:46,000 --> 00:04:49,790
Sounds technical. Can you break down what that actually means for like my server

108
00:04:49,790 --> 00:04:50,720
costs? Yeah,

109
00:04:50,720 --> 00:04:55,610
absolutely. It lets you create what they call persistent KV caches of your

110
00:04:55,610 --> 00:04:57,200
documents. Think of

111
00:04:57,200 --> 00:05:03,590
it like this. The LLM reads that whole 500 page manual properly just once. Morphic

112
00:05:03,590 --> 00:05:04,640
then takes a

113
00:05:04,640 --> 00:05:09,600
perfect index snapshot of the LLM's understanding of that document. It essentially

114
00:05:09,600 --> 00:05:10,480
freezes that

115
00:05:10,480 --> 00:05:15,710
understanding and saves it like a super smart sticky note. So the AI doesn't have

116
00:05:15,710 --> 00:05:16,400
to reread

117
00:05:16,400 --> 00:05:21,360
the entire thing from scratch every single time someone asks a related question. Ah,

118
00:05:21,360 --> 00:05:25,710
so you're pre-processing the intelligence, not just the raw text. That sounds like

119
00:05:25,710 --> 00:05:25,920
it would

120
00:05:25,920 --> 00:05:30,790
massively speed things up and cut down compute costs drastically, avoiding all that

121
00:05:30,790 --> 00:05:31,360
repetitive

122
00:05:31,360 --> 00:05:35,040
heavy lifting. That's a big deal for a production system. Huge deal. And for

123
00:05:35,040 --> 00:05:35,920
developers who need

124
00:05:35,920 --> 00:05:40,370
more control, Morphic helps bring structure back to this unstructured mess.

125
00:05:40,370 --> 00:05:41,840
Remember we talked about

126
00:05:41,840 --> 00:05:45,360
diagrams losing their spatial meaning? Knowledge graphs are the answer there. Right.

127
00:05:45,360 --> 00:05:45,840
So letting

128
00:05:45,840 --> 00:05:48,780
users build the main specific knowledge graphs and doing it with just a single line

129
00:05:48,780 --> 00:05:49,120
of code,

130
00:05:49,120 --> 00:05:53,180
that means you're putting the logic back in, turning those mangled strings back

131
00:05:53,180 --> 00:05:53,360
into a

132
00:05:53,360 --> 00:05:57,810
connected map of how things relate. Precisely. The AI can then follow those

133
00:05:57,810 --> 00:05:59,200
connections logically,

134
00:05:59,200 --> 00:06:03,120
much more reliable. And alongside that, there's the natural language rules engine.

135
00:06:03,120 --> 00:06:08,240
Think of it like defining rules for your unstructured data, but using plain English,

136
00:06:08,240 --> 00:06:12,810
not complicated code. You can just tell it how data should be ingested, categorized,

137
00:06:12,810 --> 00:06:13,440
queried.

138
00:06:13,440 --> 00:06:18,250
It's like using common sense to structure chaos. And it also handles metadata

139
00:06:18,250 --> 00:06:19,200
extraction,

140
00:06:19,200 --> 00:06:24,000
pulling out specifics like bounding boxes from images or classifying parts of a

141
00:06:24,000 --> 00:06:24,640
document

142
00:06:24,640 --> 00:06:29,200
quickly and scalably. Yes. Extracting that crucial low-level detail without adding

143
00:06:29,200 --> 00:06:29,440
more

144
00:06:29,440 --> 00:06:35,040
fragility to the pipeline. Okay. It sounds really powerful, but maybe complex. If

145
00:06:35,040 --> 00:06:35,760
Morphic is doing

146
00:06:35,760 --> 00:06:39,600
all this heavy lifting visual analysis, knowledge graphs, caching, does that just

147
00:06:39,600 --> 00:06:40,400
shift the cost

148
00:06:40,400 --> 00:06:45,510
from compute to developer complexity, especially for a beginner? That's a really

149
00:06:45,510 --> 00:06:46,240
fair point. But

150
00:06:46,240 --> 00:06:50,430
the aim here is unification. Instead of you juggling, say, 12 different tools, you're

151
00:06:50,430 --> 00:06:50,880
managing

152
00:06:50,880 --> 00:06:55,050
one integrated system. And they've tried to make getting started pretty easy. If

153
00:06:55,050 --> 00:06:55,840
you're a beginner

154
00:06:55,840 --> 00:07:00,080
and you just want your AI app to stop being confidently wrong, the hosted option is

155
00:07:00,080 --> 00:07:00,320
probably

156
00:07:00,320 --> 00:07:04,550
the simplest path. Okay. Tell us more. How does someone actually get started with

157
00:07:04,550 --> 00:07:05,440
this? Easiest

158
00:07:05,440 --> 00:07:09,810
way. Sign up for the free tier directly on Morphic's site. They say it's a generous

159
00:07:09,810 --> 00:07:10,480
free tier,

160
00:07:10,480 --> 00:07:14,750
enough to actually build and test things properly before you hit any paywalls.

161
00:07:14,750 --> 00:07:15,520
After that, the

162
00:07:15,520 --> 00:07:20,560
pricing is transparent based on your actual compute usage. No complex licenses.

163
00:07:20,560 --> 00:07:21,120
Just pay

164
00:07:21,120 --> 00:07:25,160
for what you use. And for developers, people who want to code against it. There's a

165
00:07:25,160 --> 00:07:26,320
Python SDK

166
00:07:26,320 --> 00:07:31,200
and also a REST API. The examples in the sources look pretty simple, like ingesting

167
00:07:31,200 --> 00:07:31,840
a file,

168
00:07:31,840 --> 00:07:36,160
basically one line of code, and asking a question is straightforward, too. You can

169
00:07:36,160 --> 00:07:36,800
ask something

170
00:07:36,800 --> 00:07:41,680
specific like, what's the height of screw 14A in the chair assembly instructions?

171
00:07:41,680 --> 00:07:42,160
And the system

172
00:07:42,160 --> 00:07:46,110
does the hard work of finding the diagram, reading it, and pulling out that exact

173
00:07:46,110 --> 00:07:47,040
measurement.

174
00:07:47,040 --> 00:07:51,360
What about someone who isn't a coder, maybe yet, but still needs this kind of

175
00:07:51,360 --> 00:07:52,480
accurate understanding

176
00:07:52,480 --> 00:07:55,770
from their documents? There's the Morfit console. It's a web interface. You can

177
00:07:55,770 --> 00:07:56,720
just upload files

178
00:07:56,720 --> 00:08:00,770
there, connect to other data sources, and basically chat with your data all within

179
00:08:00,770 --> 00:08:01,520
the same place.

180
00:08:01,520 --> 00:08:05,760
So you get the power of the backend without needing to write code right away.

181
00:08:05,760 --> 00:08:10,400
Good to have options. Now, for folks thinking about self-hosting or larger deployments,

182
00:08:11,360 --> 00:08:15,080
we should touch on licensing. That's always critical. Right. The core product is

183
00:08:15,080 --> 00:08:15,280
source

184
00:08:15,280 --> 00:08:19,760
available. It uses the business source license 1.1. What that means is it's

185
00:08:19,760 --> 00:08:20,800
completely free

186
00:08:20,800 --> 00:08:25,120
for personal use or for indie developers. If you're building a commercial product

187
00:08:25,120 --> 00:08:25,360
with it,

188
00:08:25,360 --> 00:08:29,340
it's still free as long as that deployment makes less than $2,000 per month in

189
00:08:29,340 --> 00:08:30,240
gross revenue.

190
00:08:30,240 --> 00:08:34,560
Okay. Good to know. And there was one crucial note about updates,

191
00:08:34,560 --> 00:08:38,480
something about a migration script. Ah, yes. Very important detail.

192
00:08:38,480 --> 00:08:43,840
If you happen to install Morphic before June 22nd, 2025, you absolutely need to run

193
00:08:43,840 --> 00:08:44,480
a specific

194
00:08:44,480 --> 00:08:48,100
migration script they provide. It's not just a routine update, it optimizes the

195
00:08:48,100 --> 00:08:48,880
authentication

196
00:08:48,880 --> 00:08:53,200
system. They're claiming a 70, 80% faster query performance after running it.

197
00:08:53,200 --> 00:08:56,800
Wow. 70, 80%. Okay. Yeah. You definitely want that speed boost.

198
00:08:56,800 --> 00:09:01,060
Definitely capture that. So yeah, overall, Morphic seems to tackle that core RREG

199
00:09:01,060 --> 00:09:01,920
problem head on.

200
00:09:01,920 --> 00:09:06,480
It unifies the tools, treats visual and unstructured data properly from the start,

201
00:09:06,480 --> 00:09:10,960
and delivers better accuracy and scaling using smart caching within a single system.

202
00:09:10,960 --> 00:09:15,360
So to recap for everyone listening, we've seen why just duct taping standard tools

203
00:09:15,360 --> 00:09:16,320
together leads to

204
00:09:16,320 --> 00:09:21,830
inaccurate, fragile, and expensive AI pipelines, especially with visual data, and

205
00:09:21,830 --> 00:09:22,880
how Morphic uses

206
00:09:22,880 --> 00:09:27,890
this AI native database approach to properly ingest, actually understand, and

207
00:09:27,890 --> 00:09:29,040
reliably retrieve

208
00:09:29,040 --> 00:09:33,920
info from complex multimodal documents. And before we wrap up, let's give one

209
00:09:33,920 --> 00:09:39,830
final thank you to SafeServer. Again, that's www.safeserver.de. They provide the

210
00:09:39,830 --> 00:09:40,160
kind of

211
00:09:40,160 --> 00:09:44,250
robust infrastructure that makes hosting advanced software like this possible,

212
00:09:44,250 --> 00:09:45,040
really supporting

213
00:09:45,040 --> 00:09:48,230
digital transformation. Okay, so this whole conversation leaves me with a final

214
00:09:48,230 --> 00:09:48,560
thought,

215
00:09:48,560 --> 00:09:52,480
something for you to mull over. If developers don't have to fight with stitching

216
00:09:52,480 --> 00:09:53,280
together a dozen

217
00:09:53,280 --> 00:09:58,110
tools anymore for complex documents, and AI can finally truly understand charts and

218
00:09:58,110 --> 00:09:58,640
diagrams

219
00:09:58,640 --> 00:10:02,560
accurately, what's the next really complex multimodal data source that's going to

220
00:10:02,560 --> 00:10:02,880
become

221
00:10:02,880 --> 00:10:08,410
critical for AI to master? Is it maybe like highly detailed satellite imagery or

222
00:10:08,410 --> 00:10:08,800
analyzing

223
00:10:08,800 --> 00:10:13,610
real-time video feeds from a busy factory floor? Hmm, something to think about.

224
00:10:13,610 --> 00:10:14,000
Definitely food

225
00:10:14,000 --> 00:10:15,840
for thought. Until next time, keep digging deep.