Today's Deep-Dive: Morphik
Ep. 277

Today's Deep-Dive: Morphik

Episode description

This episode discusses the limitations of traditional Retrieval Augmented Generation (RAG) systems in handling complex, multimodal documents, such as PDFs with diagrams or technical manuals. These systems often fail in production due to their fragile pipelines, which cobble together multiple tools for text extraction, OCR, and embedding, leading to a loss of crucial visual and spatial information. This inaccuracy, coupled with the high cost of reprocessing large documents, makes scaling RAG challenging. The document introduces Morphic, an AI-native toolset designed to address these issues. Morphic offers first-class support for unstructured and multimodal data, using specialized search techniques that understand visual content. It also employs cache augmented generation, allowing AI models to index documents once and retain that understanding, significantly reducing processing time and costs. Furthermore, Morphic facilitates the creation of domain-specific knowledge graphs and uses a natural language rules engine for data ingestion and querying, aiming to simplify the development process. For beginners, Morphic offers a hosted option and a generous free tier, with transparent pay-as-you-go pricing. Developers can utilize a Python SDK or REST API, while a web console provides a no-code interface. The core product is source-available under a business source license, free for personal use and commercial use up to $2,000 in gross revenue per month. A crucial update requires running a migration script for significant query performance improvements. Ultimately, Morphic aims to provide a more accurate, scalable, and cost-effective solution for building AI applications that can truly understand complex documents.

Gain digital sovereignty now and save costs

Let’s have a look at your digital challenges together. What tools are you currently using? Are your processes optimal? How is the state of backups and security updates?

Digital Souvereignty is easily achived with Open Source software (which usually cost way less, too). Our division Safeserver offers hosting, operation and maintenance for countless Free and Open Source tools.

Try it now for 1 Euro - 30 days free!

Download transcript (.srt)
0:00

Welcome to the Deep Dive. Today, we are jumping right into a really core problem

0:05

for modern AI

0:06

applications. It's all about getting AI to actually understand the documents you

0:10

give it,

0:11

especially the tricky ones, you know, visual stuff, complex PDFs, multimodal things,

0:16

the technical specs, manuals, diagrams, maybe even training videos.

0:20

Exactly. If you're trying to build something reliable, something that gives

0:23

accurate answers,

0:23

well, you know, the moment you throw in a chart or like a complex PDF, accuracy

0:29

just tanks.

0:30

We're looking at some sources today that really diagnose this, and they introduce a

0:33

toolset called

0:34

Morphic. They claim it provides the most accurate foundation for these kinds of

0:39

document-based AI

0:40

apps. Right. So our mission here, for you listening, is to get a clear picture of

0:44

why the standard

0:45

approach of retrieval, augmented generation, or RAG, why it often fails when things

0:50

get serious,

0:50

when you try to scale it up, and then how this newer AI native toolset, Morphic,

0:55

how it aims to

0:55

fix that, the scaling, the cost, and especially the accuracy, making it easier for

0:59

beginners too.

1:00

Definitely. We'll get into those fragile pipelines in a moment. But first, just a

1:04

quick word from our

1:04

supporter who helps keep these deep dives going. Safe Server ensures the robust

1:08

hosting of cutting

1:09

edge software, like the tools we're discussing. They support you in your digital

1:13

transformation.

1:14

You can find more info at www.safeserver.de. Okay, so let's start with that

1:20

baseline. Retrieval,

1:22

augmented generation, R. It's pretty much the standard way for grounding large

1:28

language models,

1:29

the big AI brains in real world data, like your company Docs, so they don't just

1:34

make stuff up.

1:36

Yeah. Arga is great for a proof of concept, a quick demo. But the sources we have

1:40

are really

1:41

clear on this, those POCs. They often fail spectacularly in production. And the

1:46

reason

1:46

is actually pretty straightforward. It feels like your whole system is held

1:49

together with

1:50

digital duct tape. Duct taping. I saw that mentioned. Like a dozen different tools

1:54

cobbled

1:55

together. You've got text extraction over here, OCR doing its thing there, embedding

1:59

models,

1:59

vector databases. Each one is a potential breaking point, right? Creating these

2:03

really fragile

2:04

pipelines that just will break under actual real world pressure. Absolutely. And

2:08

that fragility.

2:09

It hurts most when you hit those visually rich documents. The fundamental issue

2:13

with

2:14

these traditional pipelines is they basically treat everything as if it's just

2:17

plain text,

2:18

even when it's obviously not. So, okay, if the pipeline just strips out all the

2:23

visual context,

2:24

what happens to something like, say, a wiring diagram? Does it just become a random

2:30

list of

2:30

labels? Pretty much, yes. The crucial visual information, gone. That detailed

2:35

diagram loses

2:36

its vital spatial relationships. You know, the fact that component A is connected

2:39

to component B,

2:41

that meaning is lost. Or a critical bar chart, maybe showing performance dropping

2:45

off.

2:45

It just becomes meaningless text fragments to the AI. Tables, oh, tables get mangled

2:50

into

2:50

unreadable strings. The system totally misses the headers, the columns, the

2:54

structure.

2:54

Wow. And the result then is pretty bad because the AI app might be confidently

2:59

returning wrong

2:59

answers. Yeah. It thinks it knows because it's awesome text, but it missed the

3:04

crucial bit in

3:04

the image or misunderstood the layout. That sounds like a huge business risk.

3:08

It is. And don't forget the cost side. Think about an application trying to answer

3:14

questions

3:14

about some massive 500-page equipment manual. The old RAGRA forces the LLM to

3:19

process and

3:20

reprocess that huge document over and over for almost every single question.

3:24

That gets incredibly slow and really, really expensive when you scale it up.

3:28

Okay. So if that's the reality of traditional RAG, inaccurate, fragile, expensive,

3:34

RAGRAs,

3:34

how do we actually build systems that can see the charts properly? This is where

3:39

Morphic comes in,

3:40

described as an AI-native toolset. The sources say it provides the most accurate

3:45

document search and

3:45

store for building AI apps. Right. This is the big shift. It's designed end-to-end

3:50

specifically

3:50

to store, represent, and search unstructured data. It treats those complex things,

3:55

PDFs, videos,

3:56

diagrams, as, well, first-class citizens right from the start. It doesn't try to cram

4:00

visual

4:00

data into a text-only box. Let's get into the features then. How does it actually

4:04

achieve that

4:04

accuracy? Multi-modal data handling sounds like step one. It offers first-class

4:09

support for

4:10

unstructured data. That seems key because, yeah, most existing systems kind of

4:16

choke when you give

4:17

them a video or a really complex PDF. And the search itself is smarter. It uses

4:21

specialized

4:21

techniques. The sources mentioned something called cold poly to build search that

4:25

actually

4:25

understands the visual content. So you can search across images, PDFs, videos, all

4:30

sorts of things

4:31

using just one single endpoint because the system gets the meaning of the visuals,

4:36

not just the text nearby. Okay. And what about that cost and scaling nightmare you

4:40

mentioned,

4:41

the constant reprocessing of giant manuals? They have something called cache

4:45

augmented generation.

4:46

Sounds technical. Can you break down what that actually means for like my server

4:49

costs? Yeah,

4:50

absolutely. It lets you create what they call persistent KV caches of your

4:55

documents. Think of

4:57

it like this. The LLM reads that whole 500 page manual properly just once. Morphic

5:03

then takes a

5:04

perfect index snapshot of the LLM's understanding of that document. It essentially

5:09

freezes that

5:10

understanding and saves it like a super smart sticky note. So the AI doesn't have

5:15

to reread

5:16

the entire thing from scratch every single time someone asks a related question. Ah,

5:21

so you're pre-processing the intelligence, not just the raw text. That sounds like

5:25

it would

5:25

massively speed things up and cut down compute costs drastically, avoiding all that

5:30

repetitive

5:31

heavy lifting. That's a big deal for a production system. Huge deal. And for

5:35

developers who need

5:35

more control, Morphic helps bring structure back to this unstructured mess.

5:40

Remember we talked about

5:41

diagrams losing their spatial meaning? Knowledge graphs are the answer there. Right.

5:45

So letting

5:45

users build the main specific knowledge graphs and doing it with just a single line

5:48

of code,

5:49

that means you're putting the logic back in, turning those mangled strings back

5:53

into a

5:53

connected map of how things relate. Precisely. The AI can then follow those

5:57

connections logically,

5:59

much more reliable. And alongside that, there's the natural language rules engine.

6:03

Think of it like defining rules for your unstructured data, but using plain English,

6:08

not complicated code. You can just tell it how data should be ingested, categorized,

6:12

queried.

6:13

It's like using common sense to structure chaos. And it also handles metadata

6:18

extraction,

6:19

pulling out specifics like bounding boxes from images or classifying parts of a

6:24

document

6:24

quickly and scalably. Yes. Extracting that crucial low-level detail without adding

6:29

more

6:29

fragility to the pipeline. Okay. It sounds really powerful, but maybe complex. If

6:35

Morphic is doing

6:35

all this heavy lifting visual analysis, knowledge graphs, caching, does that just

6:39

shift the cost

6:40

from compute to developer complexity, especially for a beginner? That's a really

6:45

fair point. But

6:46

the aim here is unification. Instead of you juggling, say, 12 different tools, you're

6:50

managing

6:50

one integrated system. And they've tried to make getting started pretty easy. If

6:55

you're a beginner

6:55

and you just want your AI app to stop being confidently wrong, the hosted option is

7:00

probably

7:00

the simplest path. Okay. Tell us more. How does someone actually get started with

7:04

this? Easiest

7:05

way. Sign up for the free tier directly on Morphic's site. They say it's a generous

7:09

free tier,

7:10

enough to actually build and test things properly before you hit any paywalls.

7:14

After that, the

7:15

pricing is transparent based on your actual compute usage. No complex licenses.

7:20

Just pay

7:21

for what you use. And for developers, people who want to code against it. There's a

7:25

Python SDK

7:26

and also a REST API. The examples in the sources look pretty simple, like ingesting

7:31

a file,

7:31

basically one line of code, and asking a question is straightforward, too. You can

7:36

ask something

7:36

specific like, what's the height of screw 14A in the chair assembly instructions?

7:41

And the system

7:42

does the hard work of finding the diagram, reading it, and pulling out that exact

7:46

measurement.

7:47

What about someone who isn't a coder, maybe yet, but still needs this kind of

7:51

accurate understanding

7:52

from their documents? There's the Morfit console. It's a web interface. You can

7:55

just upload files

7:56

there, connect to other data sources, and basically chat with your data all within

8:00

the same place.

8:01

So you get the power of the backend without needing to write code right away.

8:05

Good to have options. Now, for folks thinking about self-hosting or larger deployments,

8:11

we should touch on licensing. That's always critical. Right. The core product is

8:15

source

8:15

available. It uses the business source license 1.1. What that means is it's

8:19

completely free

8:20

for personal use or for indie developers. If you're building a commercial product

8:25

with it,

8:25

it's still free as long as that deployment makes less than $2,000 per month in

8:29

gross revenue.

8:30

Okay. Good to know. And there was one crucial note about updates,

8:34

something about a migration script. Ah, yes. Very important detail.

8:38

If you happen to install Morphic before June 22nd, 2025, you absolutely need to run

8:43

a specific

8:44

migration script they provide. It's not just a routine update, it optimizes the

8:48

authentication

8:48

system. They're claiming a 70, 80% faster query performance after running it.

8:53

Wow. 70, 80%. Okay. Yeah. You definitely want that speed boost.

8:56

Definitely capture that. So yeah, overall, Morphic seems to tackle that core RREG

9:01

problem head on.

9:01

It unifies the tools, treats visual and unstructured data properly from the start,

9:06

and delivers better accuracy and scaling using smart caching within a single system.

9:10

So to recap for everyone listening, we've seen why just duct taping standard tools

9:15

together leads to

9:16

inaccurate, fragile, and expensive AI pipelines, especially with visual data, and

9:21

how Morphic uses

9:22

this AI native database approach to properly ingest, actually understand, and

9:27

reliably retrieve

9:29

info from complex multimodal documents. And before we wrap up, let's give one

9:33

final thank you to SafeServer. Again, that's www.safeserver.de. They provide the

9:39

kind of

9:40

robust infrastructure that makes hosting advanced software like this possible,

9:44

really supporting

9:45

digital transformation. Okay, so this whole conversation leaves me with a final

9:48

thought,

9:48

something for you to mull over. If developers don't have to fight with stitching

9:52

together a dozen

9:53

tools anymore for complex documents, and AI can finally truly understand charts and

9:58

diagrams

9:58

accurately, what's the next really complex multimodal data source that's going to

10:02

become

10:02

critical for AI to master? Is it maybe like highly detailed satellite imagery or

10:08

analyzing

10:08

real-time video feeds from a busy factory floor? Hmm, something to think about.

10:13

for thought. Until next time, keep digging deep.

10:13

for thought. Until next time, keep digging deep.