Today's Deep-Dive: Morphik

0:00

Welcome to the Deep Dive. Today, we are jumping right into a really core problem

0:05

for modern AI

0:06

applications. It's all about getting AI to actually understand the documents you

0:10

give it,

0:11

especially the tricky ones, you know, visual stuff, complex PDFs, multimodal things,

0:16

the technical specs, manuals, diagrams, maybe even training videos.

0:20

Exactly. If you're trying to build something reliable, something that gives

0:23

accurate answers,

0:23

well, you know, the moment you throw in a chart or like a complex PDF, accuracy

0:29

just tanks.

0:30

We're looking at some sources today that really diagnose this, and they introduce a

0:33

toolset called

0:34

Morphic. They claim it provides the most accurate foundation for these kinds of

0:39

document-based AI

0:40

apps. Right. So our mission here, for you listening, is to get a clear picture of

0:44

why the standard

0:45

approach of retrieval, augmented generation, or RAG, why it often fails when things

0:50

get serious,

0:50

when you try to scale it up, and then how this newer AI native toolset, Morphic,

0:55

how it aims to

0:55

fix that, the scaling, the cost, and especially the accuracy, making it easier for

0:59

beginners too.

1:00

Definitely. We'll get into those fragile pipelines in a moment. But first, just a

1:04

quick word from our

1:04

supporter who helps keep these deep dives going. Safe Server ensures the robust

1:08

hosting of cutting

1:09

edge software, like the tools we're discussing. They support you in your digital

1:13

transformation.

1:14

You can find more info at www.safeserver.de. Okay, so let's start with that

1:20

baseline. Retrieval,

1:22

augmented generation, R. It's pretty much the standard way for grounding large

1:28

language models,

1:29

the big AI brains in real world data, like your company Docs, so they don't just

1:34

make stuff up.

1:36

Yeah. Arga is great for a proof of concept, a quick demo. But the sources we have

1:40

are really

1:41

clear on this, those POCs. They often fail spectacularly in production. And the

1:46

reason

1:46

is actually pretty straightforward. It feels like your whole system is held

1:49

together with

1:50

digital duct tape. Duct taping. I saw that mentioned. Like a dozen different tools

1:54

cobbled

1:55

together. You've got text extraction over here, OCR doing its thing there, embedding

1:59

models,

1:59

vector databases. Each one is a potential breaking point, right? Creating these

2:03

really fragile

2:04

pipelines that just will break under actual real world pressure. Absolutely. And

2:08

that fragility.

2:09

It hurts most when you hit those visually rich documents. The fundamental issue

2:13

with

2:14

these traditional pipelines is they basically treat everything as if it's just

2:17

plain text,

2:18

even when it's obviously not. So, okay, if the pipeline just strips out all the

2:23

visual context,

2:24

what happens to something like, say, a wiring diagram? Does it just become a random

2:30

list of

2:30

labels? Pretty much, yes. The crucial visual information, gone. That detailed

2:35

diagram loses

2:36

its vital spatial relationships. You know, the fact that component A is connected

2:39

to component B,

2:41

that meaning is lost. Or a critical bar chart, maybe showing performance dropping

2:45

off.

2:45

It just becomes meaningless text fragments to the AI. Tables, oh, tables get mangled

2:50

into

2:50

unreadable strings. The system totally misses the headers, the columns, the

2:54

structure.

2:54

Wow. And the result then is pretty bad because the AI app might be confidently

2:59

returning wrong

2:59

answers. Yeah. It thinks it knows because it's awesome text, but it missed the

3:04

crucial bit in

3:04

the image or misunderstood the layout. That sounds like a huge business risk.

3:08

It is. And don't forget the cost side. Think about an application trying to answer

3:14

questions

3:14

about some massive 500-page equipment manual. The old RAGRA forces the LLM to

3:19

process and

3:20

reprocess that huge document over and over for almost every single question.

3:24

That gets incredibly slow and really, really expensive when you scale it up.

3:28

Okay. So if that's the reality of traditional RAG, inaccurate, fragile, expensive,

3:34

RAGRAs,

3:34

how do we actually build systems that can see the charts properly? This is where

3:39

Morphic comes in,

3:40

described as an AI-native toolset. The sources say it provides the most accurate

3:45

document search and

3:45

store for building AI apps. Right. This is the big shift. It's designed end-to-end

3:50

specifically

3:50

to store, represent, and search unstructured data. It treats those complex things,

3:55

PDFs, videos,

3:56

diagrams, as, well, first-class citizens right from the start. It doesn't try to cram

4:00

visual

4:00

data into a text-only box. Let's get into the features then. How does it actually

4:04

achieve that

4:04

accuracy? Multi-modal data handling sounds like step one. It offers first-class

4:09

support for

4:10

unstructured data. That seems key because, yeah, most existing systems kind of

4:16

choke when you give

4:17

them a video or a really complex PDF. And the search itself is smarter. It uses

4:21

specialized

4:21

techniques. The sources mentioned something called cold poly to build search that

4:25

actually

4:25

understands the visual content. So you can search across images, PDFs, videos, all

4:30

sorts of things

4:31

using just one single endpoint because the system gets the meaning of the visuals,

4:36

not just the text nearby. Okay. And what about that cost and scaling nightmare you

4:40

mentioned,

4:41

the constant reprocessing of giant manuals? They have something called cache

4:45

augmented generation.

4:46

Sounds technical. Can you break down what that actually means for like my server

4:49

costs? Yeah,

4:50

absolutely. It lets you create what they call persistent KV caches of your

4:55

documents. Think of

4:57

it like this. The LLM reads that whole 500 page manual properly just once. Morphic

5:03

then takes a

5:04

perfect index snapshot of the LLM's understanding of that document. It essentially

5:09

freezes that

5:10

understanding and saves it like a super smart sticky note. So the AI doesn't have

5:15

to reread

5:16

the entire thing from scratch every single time someone asks a related question. Ah,

5:21

so you're pre-processing the intelligence, not just the raw text. That sounds like

5:25

it would

5:25

massively speed things up and cut down compute costs drastically, avoiding all that

5:30

repetitive

5:31

heavy lifting. That's a big deal for a production system. Huge deal. And for

5:35

developers who need

5:35

more control, Morphic helps bring structure back to this unstructured mess.

5:40

Remember we talked about

5:41

diagrams losing their spatial meaning? Knowledge graphs are the answer there. Right.

5:45

So letting

5:45

users build the main specific knowledge graphs and doing it with just a single line

5:48

of code,

5:49

that means you're putting the logic back in, turning those mangled strings back

5:53

into a

5:53

connected map of how things relate. Precisely. The AI can then follow those

5:57

connections logically,

5:59

much more reliable. And alongside that, there's the natural language rules engine.

6:03

Think of it like defining rules for your unstructured data, but using plain English,

6:08

not complicated code. You can just tell it how data should be ingested, categorized,

6:12

queried.

6:13

It's like using common sense to structure chaos. And it also handles metadata

6:18

extraction,

6:19

pulling out specifics like bounding boxes from images or classifying parts of a

6:24

document

6:24

quickly and scalably. Yes. Extracting that crucial low-level detail without adding

6:29

more

6:29

fragility to the pipeline. Okay. It sounds really powerful, but maybe complex. If

6:35

Morphic is doing

6:35

all this heavy lifting visual analysis, knowledge graphs, caching, does that just

6:39

shift the cost

6:40

from compute to developer complexity, especially for a beginner? That's a really

6:45

fair point. But

6:46

the aim here is unification. Instead of you juggling, say, 12 different tools, you're

6:50

managing

6:50

one integrated system. And they've tried to make getting started pretty easy. If

6:55

you're a beginner

6:55

and you just want your AI app to stop being confidently wrong, the hosted option is

7:00

probably

7:00

the simplest path. Okay. Tell us more. How does someone actually get started with

7:04

this? Easiest

7:05

way. Sign up for the free tier directly on Morphic's site. They say it's a generous

7:09

free tier,

7:10

enough to actually build and test things properly before you hit any paywalls.

7:14

After that, the

7:15

pricing is transparent based on your actual compute usage. No complex licenses.

7:20

Just pay

7:21

for what you use. And for developers, people who want to code against it. There's a

7:25

Python SDK

7:26

and also a REST API. The examples in the sources look pretty simple, like ingesting

7:31

a file,

7:31

basically one line of code, and asking a question is straightforward, too. You can

7:36

ask something

7:36

specific like, what's the height of screw 14A in the chair assembly instructions?

7:41

And the system

7:42

does the hard work of finding the diagram, reading it, and pulling out that exact

7:46

measurement.

7:47

What about someone who isn't a coder, maybe yet, but still needs this kind of

7:51

accurate understanding

7:52

from their documents? There's the Morfit console. It's a web interface. You can

7:55

just upload files

7:56

there, connect to other data sources, and basically chat with your data all within

8:00

the same place.

8:01

So you get the power of the backend without needing to write code right away.

8:05

Good to have options. Now, for folks thinking about self-hosting or larger deployments,

8:11

we should touch on licensing. That's always critical. Right. The core product is

8:15

source

8:15

available. It uses the business source license 1.1. What that means is it's

8:19

completely free

8:20

for personal use or for indie developers. If you're building a commercial product

8:25

with it,

8:25

it's still free as long as that deployment makes less than $2,000 per month in

8:29

gross revenue.

8:30

Okay. Good to know. And there was one crucial note about updates,

8:34

something about a migration script. Ah, yes. Very important detail.

8:38

If you happen to install Morphic before June 22nd, 2025, you absolutely need to run

8:43

a specific

8:44

migration script they provide. It's not just a routine update, it optimizes the

8:48

authentication

8:48

system. They're claiming a 70, 80% faster query performance after running it.

8:53

Wow. 70, 80%. Okay. Yeah. You definitely want that speed boost.

8:56

Definitely capture that. So yeah, overall, Morphic seems to tackle that core RREG

9:01

problem head on.

9:01

It unifies the tools, treats visual and unstructured data properly from the start,

9:06

and delivers better accuracy and scaling using smart caching within a single system.

9:10

So to recap for everyone listening, we've seen why just duct taping standard tools

9:15

together leads to

9:16

inaccurate, fragile, and expensive AI pipelines, especially with visual data, and

9:21

how Morphic uses

9:22

this AI native database approach to properly ingest, actually understand, and

9:27

reliably retrieve

9:29

info from complex multimodal documents. And before we wrap up, let's give one

9:33

final thank you to SafeServer. Again, that's www.safeserver.de. They provide the

9:39

kind of

9:40

robust infrastructure that makes hosting advanced software like this possible,

9:44

really supporting

9:45

digital transformation. Okay, so this whole conversation leaves me with a final

9:48

thought,

9:48

something for you to mull over. If developers don't have to fight with stitching

9:52

together a dozen

9:53

tools anymore for complex documents, and AI can finally truly understand charts and

9:58

diagrams

9:58

accurately, what's the next really complex multimodal data source that's going to

10:02

become

10:02

critical for AI to master? Is it maybe like highly detailed satellite imagery or

10:08

analyzing

10:08

real-time video feeds from a busy factory floor? Hmm, something to think about.

10:13

for thought. Until next time, keep digging deep.

10:13

for thought. Until next time, keep digging deep.

Today's Deep-Dive: Morphik

Episode description

Persons