Today's Deep-Dive: CKAN
Ep. 313

Today's Deep-Dive: CKAN

Episode description

The Comprehensive Knowledge Archive Network (CKAN) is the world’s leading open-source data management system, serving as the digital backbone for governments and organizations globally. It transforms vast, often disorganized, digital data into standardized, accessible, and usable formats, akin to a highly efficient library catalog for datasets. Its open-source nature fosters trust and long-term stability, allowing for public auditing and preventing vendor lock-in, which is crucial for critical infrastructure. The platform’s robustness is evidenced by its significant community activity on GitHub, primarily built on the stable Python language. CKAN powers major national open data portals, such as data.gov in the US and open.canada.ca, as well as vital humanitarian data hubs. Its adoption spans continents, with governments like Canada, Singapore, and Australia utilizing it to manage tens of thousands to data from over 800 organizations, respectively. Beyond public transparency, major companies leverage CKAN for internal data governance, managing sensitive information and breaking down data silos within private networks. Recognized as a Digital Public Good (DPG), CKAN actively contributes to achieving UN Sustainable Development Goals by enhancing transparency and data accessibility. The nonprofit Open Knowledge Foundation stewards CKAN, ensuring it remains a neutral, accessible global public asset. The platform offers a user-friendly front-end, a powerful API for programmatic access, and integrated visualization tools, speeding up data understanding and integration. The CKAN community provides numerous avenues for engagement, including webinars, meetups, and chat channels, highlighting its dynamic ecosystem. Ultimately, CKAN represents a future built on shared, accessible knowledge, potentially transforming global development and governance.

Gain digital sovereignty now and save costs

Let’s have a look at your digital challenges together. What tools are you currently using? Are your processes optimal? How is the state of backups and security updates?

Digital Souvereignty is easily achived with Open Source software (which usually cost way less, too). Our division Safeserver offers hosting, operation and maintenance for countless Free and Open Source tools.

Try it now!

Download transcript (.srt)
0:00

Welcome to the deep dive. We're here to give you the context, the facts, everything

0:04

you need to feel really informed fast. Today we're going to something pretty cool.

0:07

We're looking behind the scenes at the basic plumbing that runs the world's open

0:12

data.

0:12

We're digging into CCAN. It's this powerful system, kind of invisible usually, but

0:17

it's like the digital backbone for governments and big companies everywhere, making

0:22

all this complex info accessible, usable.

0:25

And we couldn't really get into this kind of global infrastructure without some

0:29

solid support. So a big thank you to SafeServer. They handled the really crucial

0:33

job of hosting software like CCAN, these high demand platforms.

0:36

They help organizations with the digital transformation, making sure data is secure,

0:40

always there when you need it. So if you're thinking about your own digital setup,

0:43

maybe boosting reliability with good hosting, check them out at www.safeserver.d.

0:49

Right. So today we're focused purely on CCAN. That stands for Comprehensive

0:53

Knowledge Archive Network. It's basically the leading open source data management

0:59

system, or DMS. It's a really key piece of tech globally. It powers these huge data

1:04

portals and hubs.

1:05

And our goal today isn't just to list who uses it. It's really to give you, the

1:09

listener, a straightforward kind of beginner friendly take on why it's more than

1:13

just software, why it's seen as a global public good.

1:16

Okay, let's unpack that a bit. This idea of a data management system, a DMS, sounds

1:20

a bit technical. But if whole countries are relying on CCAN, what is it actually

1:24

doing? How does it make something so complicated seem simple?

1:27

Okay, um, maybe think of CCAN like the world's best library catalog, but

1:32

specifically for digital data sets. You know how data, especially from governments

1:37

or science, it often just piles up huge amounts kind of disorganized.

1:41

CKCAN is like the essential plumbing for that information. It's open source, and it's

1:46

really designed to make it super easy to publish it, share it, and then crucially

1:51

use it. It basically turns all that raw info into something standardized, something

1:55

you can actually search through.

1:56

And that open source part feels really important here, right? Especially when you

2:00

talk about critical infrastructure. I mean, for sensitive government data or maybe

2:03

financial stuff, you might think some private closed off software is safer.

2:08

So why is CCAN being open source actually a good thing? Why do governments and

2:11

companies trust it?

2:13

Oh, it's a massive advantage. Really, it boils down to trust and long term

2:18

stability. With proprietary software, you can get stuck with one vendor, you know,

2:23

vendor lock-in.

2:24

A government's whole digital strategy could depend on one company's, well, their

2:29

decisions, their pricing. But CCAN being open source means the code is out there.

2:33

Anyone can look at it, audit it, which kind of intuitively maybe makes it more

2:37

secure because you have potentially thousands of security experts looking at it,

2:41

not just one company's team.

2:43

Plus, you can adopt it without being tied to a single corporation.

2:46

Yeah, that makes total sense for government thinking long term. And our sources

2:49

mentioned it's tech side too. It's mostly Python, is that right? And it has this

2:53

huge community activity like 4.9

2:55

thousand stars, 2.1 thousand forks on GitHub. For people who aren't developers,

2:59

what do those numbers actually mean? What does that tell us about how healthy this

3:03

platform is?

3:04

Those numbers, they basically confirm CCAN isn't some, you know, niche project. It's

3:09

a globally recognized standard. The fact it's mainly Python means it's built on a

3:14

language that's mature, stable, really good for handling big

3:18

data stuff. And the 4.9k stars. That means thousands of developers and hundreds of

3:23

organizations basically trust this code enough to like bookmark it, use it in their

3:27

own work. The 2.1k forks. That shows people are constantly taking it, tweaking it,

3:32

improving it for their own needs. It proves it's alive, you know, a dynamic

3:36

resource, not just static software.

3:38

It's pretty amazing that this one platform kept going by community powers hundreds

3:42

of these data portals all over the world. So if any of us listening have like

3:46

looked up open government data,

3:48

the chances are we've used CCAN. Where exactly is it running?

3:50

Absolutely. The reach is, well, it's genuinely global. If you're looking at

3:55

official open data portals, you're almost certainly bumping into CCAN. It's behind

3:59

major national sites like catalog.data.gov in the US,

4:03

open.canada.kaya data for Canada. But it goes broader too. It's also the engine for

4:08

vital humanitarian data like on data.humdata.org. So yeah, it's fair to say it's

4:13

the world's leading open source data portal platform, no question.

4:17

And what's really fascinating is just grasping the amount of information being

4:20

handled here. Let's talk government use first, because that's where you really see

4:24

the commitment to public transparency.

4:26

We're not just talking one or two countries leading the way, are we? This sounds

4:30

like a standard across continents.

4:31

Totally. Our sources confirm it. National governments, regional bodies across the

4:36

EU, North and South America, Asia, Oceania. This wide adoption really signals a

4:41

kind of global agreement on how to best handle open data.

4:45

And to give you a sense of the scale, think about the complexity. The government of

4:49

Canada. They use CKAN for like tens of thousands of data sets, federal stuff,

4:54

everything from, I don't know, weather records to population stats.

4:58

Or look at Singapore. The Singapore government uses it for this massive national

5:02

portal covering everything economy, education, environment, finance, health, all in

5:06

one place.

5:07

Wow. And I think the most mind boggling stat might be from Australia.

5:11

Yeah, probably. The Australian government uses CKAN to pull together and publish

5:15

data from over 800 different organizations.

5:17

Just think about that for a second, making data from 800 separate agencies, all

5:21

maybe doing things slightly differently, searchable through one single interface.

5:25

CKAN is that crucial tool that brings it all together, enforces some consistency

5:30

where it would otherwise be chaos.

5:33

OK, so it handles all this public data, sensitive stuff for big governments. I

5:38

wonder, is that security and structure why companies like it, too?

5:42

It's not just for the public sector, right? This is where it gets really

5:44

interesting for me.

5:45

How does a system built for transparency manage like confidential company data?

5:51

Exactly. And that really speaks to how robust and flexible CKAN is.

5:56

Yes, major companies use it, too. They adopt it to manage their own internal data

6:00

assets, which, you know, obviously needs a different security approach than just

6:04

publishing everything online.

6:06

Can you say a bit more about the difference? Like, when a big drug company or an

6:09

energy firm uses CKAN internally, what's the goal there?

6:12

Well, the goal shifts, right? It's less about public transparency and more about

6:18

internal governance and breaking down data silos.

6:21

You know how in big organizations, resources, energy, pharma, finance data gets

6:26

trapped, like one department has info another team needs, but they can't easily get

6:30

it.

6:31

CKAN offers the same powerful cataloging and access tools, but set up for private

6:35

networks.

6:36

It lets internal people find, say, crucial research data or financial models fast,

6:41

but with really strict controls over who sees what.

6:43

So it's basically a sophisticated engine for managing sensitive internal knowledge.

6:48

Right. So whether it's a government publishing health stats or a bank managing

6:52

internal risk stuff, the core value is standardization accessibility.

6:56

Yeah, makes sense. But moving beyond just publishing data, what makes CKAN

7:00

recognized as this like global good?

7:03

Why is it more than just really good software?

7:05

Yeah, if you zoom out to the bigger picture, the impact is actually huge.

7:08

CKAN is officially recognized as a digital public good, a DPG.

7:12

It's listed in the digital public registry and that recognition, it's tied directly

7:16

to how the platform helps achieve the United Nations Sustainable Development Goals,

7:20

the SDGs.

7:21

That's a massive claim. We're talking about goals like fighting poverty, climate

7:25

action, better health.

7:26

How does a data management system actually help with that?

7:29

Well, think about it. Transparency and accessible information are like foundational

7:34

for solving big global problems.

7:36

The sources point out CCAN actively helps tackle nine of the 17 SDGs from the UN's

7:41

2030 agenda.

7:42

So, for instance, by centralizing disaster response data that connects to SDG 13

7:47

climate action,

7:48

it lets NGOs and emergency teams figure out where resources are needed, who's

7:52

vulnerable, much faster than digging through scattered reports.

7:56

By just enabling efficient, standardized data flow, it directly contributes to

8:00

these major global efforts.

8:01

And keeping something this powerful, neutral and accessible that must need special

8:05

governance, especially being open source,

8:07

who makes sure it stays a public good, you know, doesn't get taken over by some

8:10

interest.

8:11

That's a really important point. That responsibility lies with the Open Knowledge

8:14

Foundation.

8:15

They're a nonprofit. They essentially hold CCAN's assets in trust.

8:19

And having this nonprofit steward is the key protection against that vendor lock-in

8:23

we talked about earlier.

8:25

It ensures the platform sticks to best practices, keeps things open, and really

8:29

safeguards its status as a global public asset for everyone, public or private

8:33

users.

8:34

OK, that governance piece is vital. So let's bring it back down to the user

8:37

experience.

8:38

Whether I'm a researcher in Canada using the government portal or maybe an analyst

8:42

at an energy company using their internal version,

8:44

what tools does CCAN actually give me to make sense of these huge data sets?

8:50

Right. So as a full data portal platform, it offers several layers of useful

8:54

features.

8:55

First, the basics. It catalogs, stores and gives access to data sets efficiently,

9:01

but it's way more than just a list of files.

9:03

It usually has a pretty rich user friendly front end, you know, the website part

9:07

you actually see and click through.

9:09

And really importantly for developers or power users, it provides a full API that's

9:14

an application programming interface.

9:16

And that's for both the data itself and the catalog about the data.

9:19

OK, let's clarify that API bit for beginners. If the data portal is like the

9:23

library building, what's the API?

9:25

Good analogy. If the portal is the library, the API is like the digital librarian

9:30

that can talk directly to other computer programs.

9:33

It means developers can build tools that automatically talk to the CCAN system so

9:37

they can automate data updates,

9:39

pull data into other dashboards or apps, let different software systems query the

9:43

catalog without a human needing to click around.

9:46

That's how you get those really sophisticated applications that use real-time

9:49

government data or internal corporate metrics.

9:52

Which is super important for big organizations integrating things.

9:55

Exactly. And CCAN often includes visualization tools right in the box.

10:00

This lets users get a quick visual sense of the data charts, maps, that kind of

10:04

thing, without needing to download massive raw files first.

10:06

It just helps speed up understanding.

10:08

And for anyone listening who's maybe intrigued by all this, wants to learn more or

10:11

even get involved, the community seems really open.

10:14

Our source has mentioned lots of ways in.

10:16

Free webinars, these CCAN monthly live meetups you can join, mailing lists like

10:21

sick and dev, chat channels on Gitter, using GitHub issues for help.

10:25

It really sounds like a living ecosystem.

10:28

OK, so let's try and wrap this up. Key takeaways.

10:31

We've learned CCAN, which is looked after by the nonprofit Open Knowledge

10:34

Foundation, is basically the world's top open source platform for data portals.

10:39

It takes massive amounts of data, government, science, even private company data,

10:42

and makes it accessible, standardized, like turning Australian public info or

10:46

internal finance data into usable resources.

10:48

And crucially, it's actively helping meet major UN sustainable development goals

10:53

just by improving transparency and how data flows.

10:56

Yeah, and that leads to, I think, a really interesting question for you to mull

10:59

over.

10:59

Given that we know it helps achieve these global SDGs and it's so fundamental to

11:05

government transparency, how might relying more on robust open source systems like

11:11

CCAN actually change

11:13

how transparent and effective global development projects or even just governance

11:17

itself become in the next, say, 10 years?

11:20

It suggests a future built not just on having data, but on truly shared, accessible

11:25

knowledge.

11:25

That's a powerful thought to end on.

11:27

And once again, we really want to thank Safe Server for supporting this Deep Dive.

11:31

Safe Server is there for your digital transformation and hosting needs, making sure

11:34

essential platforms like CCAN can run reliably, securely.

11:38

You can find out more at www.safeserver.de.

11:41

Go forth, be informed, and we'll catch you on the next one.

11:41

Go forth, be informed, and we'll catch you on the next one.