Today's Deep-Dive: Harper

0:00

Welcome back to the Deep Dive, the place where we distill complex information into

0:04

high-value knowledge nuggets custom-tailored for you.

0:08

Before we jump into today's fascinating analysis of a really interesting privacy-first

0:12

software project, we want to acknowledge the support that makes these deep dives

0:16

possible.

0:17

This exploration is supported by Safe Server. They manage hosting for the next

0:20

generation of software and support clients with digital transformation.

0:24

You know, when we talk about incredible new, efficient software like we are today,

0:28

someone needs to host it securely. Safe Server is there for you. Find out more at

0:32

www.safeserver.de.

0:34

Okay, let's unpack this. Our mission today is a deep dive into Harper, an open-source

0:40

grammar checker.

0:42

It's a project created by the people at Automatic, you know, the company behind

0:45

GitHub. So, if you're like most people, you probably use writing tools, right?

0:49

But maybe you also worry about what happens to your words after you hit save.

0:52

Harper promises to be the solution.

0:54

We're going to try and understand how this tool tackles what we're calling the

0:57

triple threat of modern editing software.

0:59

Poor privacy, really bad performance, and unnecessary cost.

1:02

Yeah, and if we connect this to the bigger picture, I mean, this project signals a

1:05

pretty crucial shift.

1:07

The developers weren't just aiming for, like, feature parity with other tools. They

1:11

were aiming for a total paradigm shift toward user control.

1:15

Our stated goal is really clear. Harper is an offline, privacy-first grammar checker.

1:20

Fast, open source, rust-powered.

1:23

And that last phrase, rust-powered, that's kind of the secret ingredient that lets

1:26

them achieve everything else.

1:28

Absolutely. And the source material is fantastic because it doesn't just introduce

1:32

a new tool.

1:33

It really acts as a kind of postmortem on the failures of the tools we've, well,

1:37

basically been forced to use until now.

1:39

So why did the creators feel the necessity to build a new grammar checker from

1:43

scratch?

1:44

Well, it seems like it was born out of some profound professional frustration.

1:47

Let's start with the critique of the market leader, Grammarly.

1:51

The developers considered it fundamentally flawed, especially for professional,

1:55

high-volume use.

1:57

First, they called it too expensive and too overbearing.

2:01

Now, that's a bit subjective, sure, but the objector issues are maybe more critical.

2:06

The suggestions often lacked necessary context or, as the source materials bluntly

2:10

put it, were sometimes just plain wrong.

2:13

Right. And then we get to the core ethical problem, the thing that really dictates

2:16

how we feel about using these tools.

2:18

Exactly. It was definitively labeled a privacy nightmare.

2:22

I mean, when you use a cloud-based checker like Grammarly, everything you type,

2:26

every sensitive document, every internal memo, it gets sent off to their remote

2:30

servers.

2:31

And while companies might claim they don't sell the data, the source material

2:34

correctly points out the concern.

2:36

This data can be used for training large language models or maybe other proprietary

2:40

purposes we simply don't have disability into.

2:43

And that ties directly back to performance too, doesn't it?

2:46

Because even if you somehow trust the vendor completely, sending every single word

2:49

you write back and forth across the internet just to get a basic suggestion, that

2:54

must slow down the process considerably.

2:56

Oh, it absolutely does. That network round-trip time, you know, waiting for the

3:00

server to process and respond, it made revising work tedious and slow.

3:05

For someone trying to maintain that flow state while writing, that kind of lag is

3:09

just a constant disruption. It pulls you right out.

3:12

OK, so if the cloud-based solution is a privacy disaster zone, what about the major

3:17

open-source alternative language tool?

3:20

I mean, that should solve the privacy problem, shouldn't it? But the source

3:22

material suggests it has its own pretty massive issues.

3:26

Yeah, language tool is characterized as great, but, and it's a big but, only if you're

3:31

willing to dedicate an enormous amount of computing muscle to it.

3:35

Its resource demands are staggeringly high. It requires gigabytes of RAM, and crucially,

3:40

it forces the user to download this massive statistical package.

3:44

It's known as an anagram dataset, and it weighs in at around 16 gigabytes.

3:49

16 gigs, just for a grammar checker. Wow. We need to pause there for a second. What

3:56

exactly is an anagram dataset, and why does language tool need such a colossal file?

4:01

Right, that's essential context. An anagram dataset is basically a massive

4:05

statistical table. It's built from analyzing billions and billions of sentences.

4:10

It tells the software how likely certain words are to follow other words. So if you

4:14

see the dog ran, the anagram model confirms, yeah, that's a highly probable

4:18

sequence.

4:19

Language tool uses the statistical approach to catch errors. But to make those

4:23

predictions accurate across many languages, you need a gigantic corpus of data, and

4:28

that data is the 16 gigabyte download.

4:30

Okay, so language tool is privacy friendly because it runs locally, right on your

4:33

machine. But it's resource intensive because it's essentially hauling around this

4:38

library the size of a small country just to check sentence flow.

4:41

Precisely. And that sheer bulk translates directly into speed problems. The creator

4:45

found it too slow. It often took several seconds to lint even a moderate size

4:49

document.

4:50

Let's quickly clarify that term for our listeners. We keep using the word lint.

4:53

What does linting actually mean here?

4:56

Sure. Linting is essentially the software checking the quality and correctness of

5:01

your text or sometimes code.

5:04

So when we say lint time, we mean the time it takes the software to scan your

5:07

document, run all its rules against it, and then present the errors or suggestions.

5:12

And if that takes several seconds, yeah, it just kills your productivity, you lose

5:16

momentum.

5:17

Okay, that sets the stage perfectly for Harper then, because here's where the

5:23

comparison becomes really stark. Harper was explicitly engineered to be like the

5:27

Goldilocks solution, fast, small, and completely private.

5:30

Exactly. Harper's gains are, well, transformational seems like the right word.

5:34

First, the privacy promise is absolute. It is completely private. It functions

5:38

entirely offline.

5:40

Your machine is the beginning and the end of the data flow, period. But the

5:43

performance metrics, I mean, that's what really blows the competition out of the

5:47

water.

5:48

While LanguageTool takes several seconds and demands gigabytes of RAM, Harper

5:52

checks the same document in milliseconds. And regarding memory footprint, it uses

5:56

less than 150th of LanguageTool's memory requirements.

6:00

That's a staggering efficiency gain. But OK, this is the point where I have to be a

6:04

bit skeptical. Surely for a tool to be that fast and that small, it must compromise

6:08

on intelligence, right?

6:10

How can Harper possibly manage accuracy without that huge 16 gigabyte Engram file

6:15

and without cloud-based AI training?

6:18

That's the critical question, you're right. And it's answered by the choice of

6:22

technology. Instead of relying on that brute force statistics approach, the Ennegrams,

6:26

Harper relies on carefully optimized, hand-crafted rules, dictionaries, and highly

6:31

efficient algorithms. And crucially, these are all built using a programming

6:35

language called Rust.

6:37

OK, we've mentioned Rust powered several times now. Why is Rust the key here,

6:41

especially for that staggering efficiency gain, particularly regarding memory use?

6:46

Well, Rust is known for a couple of things. Memory safety and extreme speed. And

6:50

the memory savings compared to language tool, which is Java-based, are enormous.

6:56

It really comes down to how the languages handle memory management. You see, Java

6:59

relies on something called a garbage collector, or GC.

7:04

It's a process that runs in the background to automatically find and clean up

7:07

memory that's no longer being used.

7:09

But that GC itself requires constant runtime resources and adds significant memory

7:13

overhead. It's always kind of there.

7:16

Rust, on the other hand, manages memory differently. It figures it out at compile

7:19

time when the program is built.

7:21

It guarantees memory safety without needing a runtime garbage collector.

7:25

And this mechanism, this absence of a GC, is why Harper can deliver the same or

7:29

similar results with almost no memory overhead.

7:32

That's how it achieves that astonishing reduction, down to less than a hundred and

7:35

fiftieth of language tool's requirements.

7:38

Ah, okay. That distinction, getting rid of the garbage collector overhead, that's

7:42

the missing technical piece I needed.

7:44

That helps understand the efficiency gains. It sounds like a design choice that

7:47

prioritizes performance above everything else.

7:49

Exactly. And that focus on efficiency also leads directly to its portability.

7:54

The source notes that Harper is small enough to load via WebAssembly, or Wasm.

7:59

Right, Wasm. For our listeners, what's the practical implication? What's the so-what

8:03

of WebAssembly?

8:05

Think of Wasm as like a tiny, standardized virtual machine.

8:09

It lets pre-compiled code run almost instantly inside any modern web browser or

8:13

application environment, so the practical implication is huge.

8:17

You don't need to install anything complex. You don't need an API key. You

8:20

certainly don't need a server.

8:22

It makes Harper truly ubiquitous. You could integrate a grammar checker that runs

8:26

at new native speeds right into a browser application almost instantly.

8:29

It really reinforces that promise of quick, local, and private processing.

8:33

And finally, it is truly open source. It's under the Apache 2.0 license, available

8:36

at writewithharper.com.

8:39

Okay, so now let's talk about the practical application side. What does this all

8:42

mean for the person actually sitting down and writing?

8:45

Well, the ecosystem looks pretty robust. Currently, the project is focused on

8:49

English support only. That's important to note.

8:52

However, the core architecture was explicitly designed from the ground up to be extensible

8:56

for other languages, and the project clearly states they welcome contributions.

9:02

So expanding its linguistic reach seems like it's simply a matter of community

9:05

effort, building on what's already a very solid, very fast foundation.

9:10

And that commitment to performance, it isn't just a launch feature, is it? It

9:14

sounds like an ongoing philosophy.

9:16

It absolutely is. It seems like the defining feature, really. The development team

9:19

is so serious about maintaining peak performance that they state they consider it

9:23

long-lint times bugs

9:25

that signals they are constantly profiling and optimizing the code. They treat any

9:30

slowdown not as a minor issue, but as a critical defect that must be addressed

9:34

immediately. That's a strong commitment.

9:36

And looking at the health of the project, the community metrics, they speak volumes,

9:39

don't they? 8.1k stars on GitHub, 202 forks, 57 contributors.

9:44

For an open source project, 8,100 stars indicate significant developer trust and viability.

9:50

It shows this is well past the side project stage. It looks like a major ecosystem

9:53

tool now.

9:54

Yeah, definitely. And that level of community engagement has translated directly

9:58

into real-world usability. They have extensive documentation for integrating Harper

10:03

into, well, virtually every major professional text editor you can think of.

10:07

They list support for Visual Studio Code, NeoVim, Helix, Emacs, and Zed. Pretty

10:11

comprehensive.

10:13

Why is integration into specialized tools like NeoVim and Helix such a big deal?

10:15

What does that tell us?

10:17

Well, it demonstrates its incredible lightness, its efficiency. NeoVim and Helix

10:22

are terminal-based, very power-user-focused environments. They're often preferred

10:27

by developers specifically because they're fast and use very few resources.

10:32

The fact that Harper integrates seamlessly there proves that it's truly lightweight

10:36

enough for even the most resource-conscious environments. It really solidifies its

10:40

place as a professional, ready-to-use tool, not just some proof of concept.

10:45

OK, so to summarize the key takeaway for you, the listener, Archer seems to

10:49

successfully solve that triple threat we talked about with modern editing software,

10:53

the lack of privacy, the high cost or resource usage, and the poor performance.

10:58

And it does this by leveraging the speed and memory efficiency of modern languages

11:02

like Rust, combined with the transparent, collaborative power of the open-source

11:06

model. It really could be a blueprint for how critical software like this can be

11:10

built in the future.

11:11

Yeah, and this whole thing raises an important question, I think, for you to

11:14

consider. As AI models and editing tools become more ubiquitous, more powerful, but

11:20

also demand more and more of your data to function, how much value do you

11:25

personally place on processing your sensitive documents you're writing locally for

11:29

that guaranteed absolute privacy?

11:32

How does that compare to the perceived convenience, or maybe the occasional smarter

11:36

suggestion offered by those massive cloud-based services? That trade-off, you know,

11:41

between privacy and perhaps features or intelligence, that's a calculation we all

11:45

probably need to be ready to make more consciously going forward.

11:49

That's a fascinating dilemma to mull over. Thank you for diving deep with us today.

12:03

And remember, this deep dive was supported by SafeServer, your partner for hosting

12:03

project at www.safeserver.de. We'll catch you next time for the next Deep Dive.

12:03

project at www.safeserver.de. We'll catch you next time for the next Deep Dive.

Today's Deep-Dive: Harper

Episode description

Persons