Today's Deep-Dive: karakeep

0:00

Welcome back to the deep dive.

0:01

This is where we cut through the noise

0:03

and focus on what really matters from all the stuff we read

0:06

and research.

0:07

Today, we're tackling a very, very modern problem.

0:10

Just digital clutter, right?

0:12

All the links, notes, PDFs we save.

0:16

We're diving into a tool called karakeep.

0:18

Yeah, karakeep.

0:19

It used to be called Hoarder.

0:20

Some listeners might remember that name.

0:22

Right, Hoarders.

0:23

And look, this isn't just another bookmark app.

0:25

We're looking at something different here.

0:26

It's self-hosted, and it actually

0:28

uses AI to help organize your own personal digital stuff.

0:32

The real insight, I think, that we're exploring

0:34

is how these powerful tools like AI organization, which usually

0:38

live in the cloud on company servers,

0:40

are kind of moving back, back into our own hands.

0:43

It's a definite trend.

0:44

OK, but before we really unpack karakeep,

0:46

we want to give a quick shout out to our supporter

0:48

for this deep dive, safeserver.de.

0:50

They help with hosting for software exactly like this,

0:53

and they support you in your digital transformation journey.

0:56

You can find out more at www.saserver.de.

1:04

So karakeep, as you said, it was hoarder,

1:07

and it's really built for, well, people like us maybe.

1:11

The data hoarders.

1:12

Guilty as charged.

1:13

We save everything, links, notes, images, you name it.

1:17

But the trick is finding it again later.

1:19

That's the hard part.

1:20

Totally.

1:20

And karakeep's angle is that it's self-hostable completely,

1:25

which means you run it, you control the server,

1:27

you control where your data goes.

1:29

And that's the hook, isn't it?

1:30

Because we've seen tons of read-it-later apps, Pocket,

1:33

Instapaper, all those.

1:35

But karakeep stands out because of, well,

1:38

two things mainly, that AI integration for organizing

1:43

things automatically and how it fights back against link rot.

1:47

Yeah, link rot is a killer for archives.

1:49

So it's more than just saving links.

1:50

It's like building an intelligent personal archive

1:53

that hopefully lasts.

1:54

Exactly.

1:54

And for anyone listening who's maybe newer to this,

1:57

let's just nail down what self-hostable really means here.

2:00

Good idea.

2:00

It's not like signing up for Spotify or something.

2:02

You're not using someone else's service.

2:04

You install karakeep on your own computer or maybe

2:07

a server you rent.

2:08

So you're in charge.

2:09

Totally in charge.

2:10

You decide the rules.

2:11

You control the data.

2:12

You're not depending on some big tech company

2:14

to keep your personal knowledge base safe or accessible.

2:18

It's about independence.

2:19

It's like instead of using the public library

2:21

for your most vital notes, you build your own secure vault

2:24

for them.

2:24

That's a great analogy.

2:25

And speaking of custom things, the name Karakeep,

2:29

it tells you a lot about the philosophy behind it.

2:31

Oh, yeah.

2:32

What's the story there?

2:33

Well, Karakeep comes from Arabic.

2:36

The word Kayasi Keek, Karakeep, it basically

2:39

means odds and ends, miscellaneous clutter, stuff

2:43

that doesn't look organized, but it has personal value.

2:46

Huh.

2:47

OK.

2:47

So it's not about forcing you into neat little folders.

2:50

Not at all.

2:51

It acknowledges that, yeah, the stuff

2:53

you grab from Reddit or Twitter or Hacker News,

2:55

those random notes, those PDFs, it might look like a mess,

2:59

but it's your mess, and it's valuable.

3:00

Karakeep embraces that clutter.

3:02

Or makes it searchable.

3:04

Exactly.

3:04

It gives you the tools to find things

3:06

within that valuable mess using smart tech.

3:09

I like that.

3:09

It's built for that impulse we all have.

3:11

Oh, got to save this for later.

3:13

But it turns that impulse into something, well, actually

3:15

useful long term, a real knowledge base.

3:17

Right.

3:18

It elevates the read it later idea.

3:19

So what exactly can it hoard?

3:23

What does the everything part cover?

3:24

It's pretty broad, actually.

3:26

It handles your standard web links, obviously,

3:28

but also simple notes.

3:29

You can just jot things down directly.

3:31

Images, PDFs, too.

3:33

OK.

3:34

And when you save a link, it's not just storing the URL.

3:37

It automatically goes out and fetches

3:39

the page title, the description, maybe a preview image,

3:42

gives you context right away.

3:43

Which is way better than just a list of naked URLs.

3:46

Much better.

3:47

But then you get to the really interesting part,

3:49

the intelligence layer, as you called it.

3:51

Yeah, the AI stuff.

3:53

But that seems like the secret sauce here.

3:54

AI tagging, summarization.

3:57

So imagine you save a long, complex article.

4:00

CareKeep can use AI, potentially external services,

4:03

like OpenAI's models, to automatically

4:05

suggest relevant tags.

4:07

Or even generate a quick summary for you.

4:09

OK, that's useful.

4:10

But hang on, if it's self-hosted,

4:12

sending my saved articles to an external AI,

4:14

doesn't that kind of defeat the purpose of privacy?

4:17

Ah, excellent point.

4:18

And the developers thought of that.

4:20

This is key.

4:21

CareKeep is specifically designed

4:23

to work with local AI models.

4:25

It supports a framework called Olama.

4:27

Olama, right.

4:28

So I can run an AI model right there on my own server.

4:31

Exactly.

4:31

You can download and run various open source AI models locally

4:34

using Olama, and CareKeep talks to that.

4:37

So you get the smart tagging and summarization,

4:39

but your data never leaves your control.

4:41

No third party clouds involved, unless you explicitly

4:44

choose that.

4:45

That's huge for the self-hosting crowd.

4:47

Privacy and power.

4:48

It's a major selling point.

4:50

And speaking of keeping your data safe and useful,

4:53

let's talk to Linkrot again.

4:55

Yes, the bane of bookmarks.

4:57

Click a link you saved a year ago, and poof, 404 not found.

5:03

It undermines the whole idea of a personal archive, right?

5:05

So how does CareKeep tackle that?

5:07

Seems like an impossible fight.

5:09

Well, it uses some clever archival techniques.

5:11

It doesn't just save the link.

5:12

It aims for a full page archival,

5:14

meaning it uses tools like one called Monolith

5:17

to essentially download the entire live web page.

5:19

Yeah.

5:20

Not just the text, but the images, the formatting,

5:22

the CSS, everything.

5:24

Wow.

5:25

And it bundles all of that into a single, self-contained HTML

5:28

file that lives on your server.

5:30

So if the original website disappears tomorrow,

5:32

you still have the complete readable content saved.

5:35

OK, that's not just a bookmark.

5:36

That's like taking a perfect permanent snapshot

5:39

of the page, a portable copy.

5:41

Precisely.

5:42

It's robust archiving.

5:44

And it does something similar for media, too.

5:46

If you save, say, a YouTube link.

5:48

Don't tell me it saves the video.

5:50

It can.

5:51

It uses tools like YTDLP, which many might know,

5:54

to automatically download and archive the video file itself.

5:56

So my saved stuff is basically immune to the original source

5:59

disappearing, link-proof and media-proof, kind of.

6:02

That's the goal, to make your archive truly resilient.

6:05

OK, so we've got all this stuff saved, links, notes, images,

6:08

PDFs, archived pages, even videos.

6:11

How on earth do you find anything

6:13

in this potentially massive pile of, what was it, Karakib?

6:17

Well, yes, your personal Karakib.

6:20

Well, search is critical, obviously.

6:21

And Karakib uses a modern search engine called MylaSearch.

6:25

MylaSearch, I've heard of that.

6:26

Supposed to be fast.

6:27

Very fast.

6:28

And it provides full text search across everything.

6:31

Your notes, the original link URLs, descriptions, tags,

6:35

and the actual content of those fully archived pages

6:38

we just talked about.

6:39

Everything is indexed.

6:40

Everything.

6:41

But they went even further.

6:42

They included OCR.

6:43

OCR, Optical Character Recognition for Images.

6:46

Exactly.

6:47

So let's say you saved a screenshot of something

6:49

important, or maybe a photo of a whiteboard diagram,

6:52

or even a restaurant menu you wanted to remember.

6:55

Carekief's OCR will actually scan that image,

6:58

find any text within it, and make that text searchable too.

7:02

Whoa.

7:02

So I could search for meeting notes,

7:04

and it might find that whiteboard photo I saved.

7:06

That's the idea.

7:07

It makes even your visual clutter searchable.

7:09

It's a really thoughtful usability feature.

7:11

That's actually incredibly useful.

7:13

OK, so the back end is powerful.

7:15

Archiving is robust.

7:16

Search is smart.

7:18

What about just using it day to day?

7:20

Is it easy to get stuff in?

7:21

They seem to have put effort into that too.

7:24

There are browser extensions, naturally, for Chrome, Firefox,

7:28

make saving links quick.

7:29

Standard stuff, but essential.

7:31

Right, and native mobile apps iOS and Android,

7:34

so you can access and save stuff on the go.

7:36

Good.

7:36

Plus, for more advanced users, there's a REST API,

7:39

support for bulk actions if you're importing lots of stuff,

7:42

even SSO single sign-on integration.

7:45

And yes, there's a dark mode.

7:47

Dark mode, always important.

7:49

OK, let's peek under the hood a bit.

7:51

For listeners thinking about running this themselves,

7:53

the tech stack gives clues about stability, how well it's built.

7:58

Sure.

7:59

It's definitely a modern stack.

8:00

The front end uses Next.js with the app router, which

8:03

is pretty current and performant.

8:05

For the database side, they use something called Drizzle ORM.

8:08

Authentication is handled by Next off.

8:10

After the communication between the browser and the server,

8:12

they use TRPC.

8:14

Whoa, OK, lots of names there.

8:15

Drizzle, TRPC, Next off.

8:18

For someone maybe just dipping their toes into self-hosting,

8:21

does this mean it's super complicated to set up

8:23

and run compared to, say, an older PHP app?

8:26

That's a fair question.

8:27

I mean, yes, the underlying tech is sophisticated.

8:31

But the reason developers choose tools like Drizzle or TRPC

8:35

is often to make things more reliable and faster

8:37

in the long run.

8:38

OK.

8:39

TRPC, for example, helps prevent certain kinds of bugs

8:42

between the front end and back end.

8:44

Drizzle offers strong typing for database queries.

8:47

The initial setup likely involves

8:49

Docker, which is standard for self-hosting these days,

8:52

but once it's running.

8:53

The goal is stability and speed.

8:55

Exactly.

8:56

The complexity is there to provide a smoother, faster

8:59

experience, especially as your archive grows.

9:02

You don't want it bogging down when

9:04

you have thousands of items saved.

9:06

This stack is built for scale.

9:08

Right, complex engine for simple, fast driving.

9:10

Makes sense.

9:11

So why did the creator build this?

9:13

Was it just a technical challenge?

9:15

It was partly that, yeah.

9:16

The creator is a systems engineer,

9:18

so they have the skills.

9:19

And they mentioned wanting to keep their web development

9:21

skills sharp, but mostly it came from a personal need.

9:24

A frustration with existing tools.

9:27

Pretty much.

9:28

They were already a heavy user of bookmarking and note

9:31

taking apps.

9:32

They mentioned getting hooked on the idea by Pocket initially.

9:36

Like many of us.

9:37

But Pocket is proprietary, cloud based.

9:39

Once they moved towards self-hosting, that was out.

9:42

They apparently liked another app called Memos for quick notes.

9:45

Memos, yeah.

9:46

That's another popular self-hosted one.

9:48

Right.

9:48

But they found Memos lacked crucial features

9:51

for their way of saving stuff.

9:53

Specifically, link previews, seeing

9:55

what a link was about instantly.

9:57

And importantly, automatic tagging.

9:59

Ah, back to the AI tagging.

10:01

Exactly.

10:02

Without that, their saved links just

10:04

became this massive, unmanageable list.

10:07

Basically, unusable clutter.

10:09

So Carrot Keep was born out of that need

10:11

to add intelligence and better archiving to the self-hosted

10:14

note-taking idea.

10:15

Got it.

10:16

That personal story really helps place Carrot Keep

10:18

in the competitive landscape.

10:21

To really get why someone would pick this,

10:22

we should probably compare it directly to some alternatives.

10:25

Definitely.

10:26

And Carrot Keep really does sit in a specific, interesting

10:28

spot.

10:29

It's trying to blend the polish you see in some commercial apps

10:32

with the core principle of self-hosted independence.

10:35

OK, so who are the main competitors or inspirations?

10:37

You mentioned Pocket.

10:38

Any others on the commercial side?

10:40

The creator specifically mentioned MyMind

10:43

as a close inspiration.

10:45

MyMind is known for its very visual, AI-powered

10:48

organization.

10:50

Looks great, works smart.

10:51

But it's commercial, proprietary, cloud only.

10:55

Carrot Keep aims for that same kind of smart visual feel,

10:58

but puts you in control of the data and the hosting.

11:01

And Pocket, as we said, got the creator hooked.

11:03

But again, no self-hosting option.

11:06

Right, so what about the open source rivals?

11:08

We mentioned Memos.

11:09

Yep, Memos is great for notes, but Carrot Keep

11:12

adds the archiving, the previews, the AI tags

11:15

that Memos lacks.

11:16

Then there's Omnivore.

11:17

Omnivore, yeah, another read it later open source option.

11:19

It is, and it's cool.

11:21

But apparently its architecture relies pretty heavily

11:23

on Google Cloud infrastructure right now,

11:25

which makes tree self-hosting, like completely independent

11:29

self-hosting, a bit difficult, or at least not

11:31

their main focus.

11:32

Whereas for Carrot Keep, self-hosting

11:33

is priority number one.

11:35

Exactly, it's designed first for self-hosting.

11:37

Then you have the older, really established players,

11:39

like Wallabag.

11:40

Wallabag's been around forever, right?

11:42

PHP-based.

11:43

Yeah, very mature project.

11:45

But maybe the UI feels a bit dated to some.

11:48

That was the creator's perspective anyway.

11:50

And finally, there are other open source link managers

11:53

like Linkwarden or Shiori.

11:55

OK, and how do they stack up?

11:56

They definitely fulfill the self-hosting need,

11:59

but they generally lack that sophisticated AI layer,

12:03

the automatic tagging, the summarization, the OCR search

12:07

that Carrot Keep is really leaning into.

12:09

So Carrot Keep's niche is becoming really clear.

12:11

It's for people who want that cutting edge AI organization

12:14

plus serious archiving against Linkrot

12:17

and are committed to self-hosting.

12:19

That's it, precisely.

12:20

It's for the power user, maybe, who

12:22

sees the value in AI tools but doesn't

12:24

want to hand their data over to a big corporation to get it.

12:27

OK, perfect.

12:27

Let's try and synthesize this.

12:28

What are the key takeaways for you, the listener,

12:30

considering Carrot Keep?

12:32

Well, first, you're looking at a really robust system

12:35

for tackling digital clutter.

12:37

It's built to be future-proof.

12:38

With that strong archival focus.

12:40

Right, fighting link rot.

12:41

Second, it's open source AGPL 3.0 license.

12:46

And despite being relatively new,

12:47

it's got serious momentum.

12:49

You mentioned the GitHub stats.

12:50

Yeah, over 20,000 stars, nearly 1,000 forks,

12:53

that's a lot of interest.

12:55

It really is.

12:56

That suggests an active community,

12:58

ongoing development, bug fixes, new features.

13:01

It's not likely to just disappear.

13:03

That community support is vital for open source projects.

13:07

Okay, so, final thoughts.

13:08

Something provocative for people to chew on.

13:10

I think it comes back to that core idea we started with,

13:13

bringing power back to the user.

13:15

We're seeing these incredibly advanced capabilities,

13:18

AI summarization, classification, deep search,

13:21

that used to be exclusive to giant cloud platforms.

13:24

And now they're running on our machines.

13:25

Exactly.

13:26

When you control the hardware your knowledge lives on,

13:29

and you control the AI that helps you understand

13:32

and organize that knowledge,

13:34

well, that fundamentally changes your relationship

13:35

with information, doesn't it?

13:37

How so?

13:37

You shift from just being a consumer reliant on platforms

13:41

to being like an independent owner and curator

13:43

of your own digital brain,

13:45

your own knowledge infrastructure.

13:46

Owning your knowledge infrastructure.

13:49

That is a powerful thought,

13:51

has huge implications for how we manage information,

13:54

how we learn, maybe even how we think going forward.

13:57

Okay, on that note, just a final reminder

14:00

that this deep dive was supported by safeserver.de.

14:03

They handle hosting for software like karakeep

14:05

and can help with your digital transformation.

14:08

Check them out at www.safeserver.de.

14:13

And yeah, if this sparked your interest,

14:15

definitely explore the world of self-hosting

14:17

and these kinds of advanced organization tools.

14:19

It's a fascinating space.

14:20

Absolutely, thanks for diving depth with us today.

14:22

We'll catch you on the next one.

Today's Deep-Dive: karakeep

Episode description

Persons