1 00:00:00,000 --> 00:00:04,320 Welcome to the deep dive. We're here to give you the context, the facts, everything 2 00:00:04,320 --> 00:00:07,840 you need to feel really informed fast. Today we're going to something pretty cool. 3 00:00:07,840 --> 00:00:12,570 We're looking behind the scenes at the basic plumbing that runs the world's open 4 00:00:12,570 --> 00:00:12,960 data. 5 00:00:12,960 --> 00:00:17,890 We're digging into CCAN. It's this powerful system, kind of invisible usually, but 6 00:00:17,890 --> 00:00:22,120 it's like the digital backbone for governments and big companies everywhere, making 7 00:00:22,120 --> 00:00:24,840 all this complex info accessible, usable. 8 00:00:25,200 --> 00:00:29,280 And we couldn't really get into this kind of global infrastructure without some 9 00:00:29,280 --> 00:00:33,870 solid support. So a big thank you to SafeServer. They handled the really crucial 10 00:00:33,870 --> 00:00:36,600 job of hosting software like CCAN, these high demand platforms. 11 00:00:36,600 --> 00:00:40,210 They help organizations with the digital transformation, making sure data is secure, 12 00:00:40,210 --> 00:00:43,570 always there when you need it. So if you're thinking about your own digital setup, 13 00:00:43,570 --> 00:00:49,200 maybe boosting reliability with good hosting, check them out at www.safeserver.d. 14 00:00:49,320 --> 00:00:53,180 Right. So today we're focused purely on CCAN. That stands for Comprehensive 15 00:00:53,180 --> 00:00:59,270 Knowledge Archive Network. It's basically the leading open source data management 16 00:00:59,270 --> 00:01:04,680 system, or DMS. It's a really key piece of tech globally. It powers these huge data 17 00:01:04,680 --> 00:01:05,680 portals and hubs. 18 00:01:05,680 --> 00:01:09,690 And our goal today isn't just to list who uses it. It's really to give you, the 19 00:01:09,690 --> 00:01:13,950 listener, a straightforward kind of beginner friendly take on why it's more than 20 00:01:13,950 --> 00:01:16,920 just software, why it's seen as a global public good. 21 00:01:16,960 --> 00:01:20,620 Okay, let's unpack that a bit. This idea of a data management system, a DMS, sounds 22 00:01:20,620 --> 00:01:24,520 a bit technical. But if whole countries are relying on CCAN, what is it actually 23 00:01:24,520 --> 00:01:27,400 doing? How does it make something so complicated seem simple? 24 00:01:27,400 --> 00:01:32,680 Okay, um, maybe think of CCAN like the world's best library catalog, but 25 00:01:32,680 --> 00:01:37,620 specifically for digital data sets. You know how data, especially from governments 26 00:01:37,620 --> 00:01:41,320 or science, it often just piles up huge amounts kind of disorganized. 27 00:01:41,560 --> 00:01:46,740 CKCAN is like the essential plumbing for that information. It's open source, and it's 28 00:01:46,740 --> 00:01:51,190 really designed to make it super easy to publish it, share it, and then crucially 29 00:01:51,190 --> 00:01:55,540 use it. It basically turns all that raw info into something standardized, something 30 00:01:55,540 --> 00:01:56,560 you can actually search through. 31 00:01:56,560 --> 00:02:00,000 And that open source part feels really important here, right? Especially when you 32 00:02:00,000 --> 00:02:03,640 talk about critical infrastructure. I mean, for sensitive government data or maybe 33 00:02:03,640 --> 00:02:07,520 financial stuff, you might think some private closed off software is safer. 34 00:02:08,080 --> 00:02:11,880 So why is CCAN being open source actually a good thing? Why do governments and 35 00:02:11,880 --> 00:02:13,160 companies trust it? 36 00:02:13,160 --> 00:02:18,370 Oh, it's a massive advantage. Really, it boils down to trust and long term 37 00:02:18,370 --> 00:02:23,540 stability. With proprietary software, you can get stuck with one vendor, you know, 38 00:02:23,540 --> 00:02:24,600 vendor lock-in. 39 00:02:24,600 --> 00:02:29,180 A government's whole digital strategy could depend on one company's, well, their 40 00:02:29,180 --> 00:02:33,880 decisions, their pricing. But CCAN being open source means the code is out there. 41 00:02:33,920 --> 00:02:37,920 Anyone can look at it, audit it, which kind of intuitively maybe makes it more 42 00:02:37,920 --> 00:02:41,840 secure because you have potentially thousands of security experts looking at it, 43 00:02:41,840 --> 00:02:43,400 not just one company's team. 44 00:02:43,400 --> 00:02:46,560 Plus, you can adopt it without being tied to a single corporation. 45 00:02:46,560 --> 00:02:49,430 Yeah, that makes total sense for government thinking long term. And our sources 46 00:02:49,430 --> 00:02:53,040 mentioned it's tech side too. It's mostly Python, is that right? And it has this 47 00:02:53,040 --> 00:02:55,520 huge community activity like 4.9 48 00:02:55,520 --> 00:02:59,850 thousand stars, 2.1 thousand forks on GitHub. For people who aren't developers, 49 00:02:59,850 --> 00:03:03,730 what do those numbers actually mean? What does that tell us about how healthy this 50 00:03:03,730 --> 00:03:04,680 platform is? 51 00:03:04,680 --> 00:03:09,920 Those numbers, they basically confirm CCAN isn't some, you know, niche project. It's 52 00:03:09,920 --> 00:03:14,600 a globally recognized standard. The fact it's mainly Python means it's built on a 53 00:03:14,600 --> 00:03:18,120 language that's mature, stable, really good for handling big 54 00:03:18,120 --> 00:03:23,070 data stuff. And the 4.9k stars. That means thousands of developers and hundreds of 55 00:03:23,070 --> 00:03:27,500 organizations basically trust this code enough to like bookmark it, use it in their 56 00:03:27,500 --> 00:03:32,930 own work. The 2.1k forks. That shows people are constantly taking it, tweaking it, 57 00:03:32,930 --> 00:03:36,420 improving it for their own needs. It proves it's alive, you know, a dynamic 58 00:03:36,420 --> 00:03:38,040 resource, not just static software. 59 00:03:38,040 --> 00:03:42,490 It's pretty amazing that this one platform kept going by community powers hundreds 60 00:03:42,490 --> 00:03:46,740 of these data portals all over the world. So if any of us listening have like 61 00:03:46,740 --> 00:03:48,000 looked up open government data, 62 00:03:48,000 --> 00:03:50,760 the chances are we've used CCAN. Where exactly is it running? 63 00:03:50,760 --> 00:03:55,300 Absolutely. The reach is, well, it's genuinely global. If you're looking at 64 00:03:55,300 --> 00:03:59,880 official open data portals, you're almost certainly bumping into CCAN. It's behind 65 00:03:59,880 --> 00:04:03,040 major national sites like catalog.data.gov in the US, 66 00:04:03,040 --> 00:04:08,890 open.canada.kaya data for Canada. But it goes broader too. It's also the engine for 67 00:04:08,890 --> 00:04:13,720 vital humanitarian data like on data.humdata.org. So yeah, it's fair to say it's 68 00:04:13,720 --> 00:04:17,280 the world's leading open source data portal platform, no question. 69 00:04:17,440 --> 00:04:20,480 And what's really fascinating is just grasping the amount of information being 70 00:04:20,480 --> 00:04:24,610 handled here. Let's talk government use first, because that's where you really see 71 00:04:24,610 --> 00:04:26,920 the commitment to public transparency. 72 00:04:26,920 --> 00:04:30,070 We're not just talking one or two countries leading the way, are we? This sounds 73 00:04:30,070 --> 00:04:31,680 like a standard across continents. 74 00:04:31,680 --> 00:04:36,770 Totally. Our sources confirm it. National governments, regional bodies across the 75 00:04:36,770 --> 00:04:41,860 EU, North and South America, Asia, Oceania. This wide adoption really signals a 76 00:04:41,860 --> 00:04:44,880 kind of global agreement on how to best handle open data. 77 00:04:45,160 --> 00:04:49,510 And to give you a sense of the scale, think about the complexity. The government of 78 00:04:49,510 --> 00:04:54,620 Canada. They use CKAN for like tens of thousands of data sets, federal stuff, 79 00:04:54,620 --> 00:04:58,280 everything from, I don't know, weather records to population stats. 80 00:04:58,280 --> 00:05:02,750 Or look at Singapore. The Singapore government uses it for this massive national 81 00:05:02,750 --> 00:05:06,890 portal covering everything economy, education, environment, finance, health, all in 82 00:05:06,890 --> 00:05:07,640 one place. 83 00:05:07,640 --> 00:05:11,040 Wow. And I think the most mind boggling stat might be from Australia. 84 00:05:11,240 --> 00:05:15,240 Yeah, probably. The Australian government uses CKAN to pull together and publish 85 00:05:15,240 --> 00:05:17,920 data from over 800 different organizations. 86 00:05:17,920 --> 00:05:21,980 Just think about that for a second, making data from 800 separate agencies, all 87 00:05:21,980 --> 00:05:25,920 maybe doing things slightly differently, searchable through one single interface. 88 00:05:25,920 --> 00:05:30,670 CKAN is that crucial tool that brings it all together, enforces some consistency 89 00:05:30,670 --> 00:05:32,920 where it would otherwise be chaos. 90 00:05:33,200 --> 00:05:38,030 OK, so it handles all this public data, sensitive stuff for big governments. I 91 00:05:38,030 --> 00:05:42,280 wonder, is that security and structure why companies like it, too? 92 00:05:42,280 --> 00:05:44,990 It's not just for the public sector, right? This is where it gets really 93 00:05:44,990 --> 00:05:45,920 interesting for me. 94 00:05:45,920 --> 00:05:51,000 How does a system built for transparency manage like confidential company data? 95 00:05:51,000 --> 00:05:56,240 Exactly. And that really speaks to how robust and flexible CKAN is. 96 00:05:56,400 --> 00:06:00,800 Yes, major companies use it, too. They adopt it to manage their own internal data 97 00:06:00,800 --> 00:06:04,870 assets, which, you know, obviously needs a different security approach than just 98 00:06:04,870 --> 00:06:06,040 publishing everything online. 99 00:06:06,040 --> 00:06:09,350 Can you say a bit more about the difference? Like, when a big drug company or an 100 00:06:09,350 --> 00:06:12,960 energy firm uses CKAN internally, what's the goal there? 101 00:06:12,960 --> 00:06:18,150 Well, the goal shifts, right? It's less about public transparency and more about 102 00:06:18,150 --> 00:06:21,880 internal governance and breaking down data silos. 103 00:06:21,880 --> 00:06:26,420 You know how in big organizations, resources, energy, pharma, finance data gets 104 00:06:26,420 --> 00:06:30,850 trapped, like one department has info another team needs, but they can't easily get 105 00:06:30,850 --> 00:06:31,120 it. 106 00:06:31,120 --> 00:06:35,670 CKAN offers the same powerful cataloging and access tools, but set up for private 107 00:06:35,670 --> 00:06:36,480 networks. 108 00:06:36,480 --> 00:06:41,060 It lets internal people find, say, crucial research data or financial models fast, 109 00:06:41,060 --> 00:06:43,960 but with really strict controls over who sees what. 110 00:06:43,960 --> 00:06:48,720 So it's basically a sophisticated engine for managing sensitive internal knowledge. 111 00:06:48,720 --> 00:06:52,120 Right. So whether it's a government publishing health stats or a bank managing 112 00:06:52,120 --> 00:06:56,000 internal risk stuff, the core value is standardization accessibility. 113 00:06:56,000 --> 00:07:00,740 Yeah, makes sense. But moving beyond just publishing data, what makes CKAN 114 00:07:00,740 --> 00:07:03,080 recognized as this like global good? 115 00:07:03,080 --> 00:07:05,400 Why is it more than just really good software? 116 00:07:05,400 --> 00:07:08,320 Yeah, if you zoom out to the bigger picture, the impact is actually huge. 117 00:07:08,320 --> 00:07:11,920 CKAN is officially recognized as a digital public good, a DPG. 118 00:07:12,040 --> 00:07:16,170 It's listed in the digital public registry and that recognition, it's tied directly 119 00:07:16,170 --> 00:07:20,140 to how the platform helps achieve the United Nations Sustainable Development Goals, 120 00:07:20,140 --> 00:07:21,080 the SDGs. 121 00:07:21,080 --> 00:07:25,770 That's a massive claim. We're talking about goals like fighting poverty, climate 122 00:07:25,770 --> 00:07:26,920 action, better health. 123 00:07:26,920 --> 00:07:29,920 How does a data management system actually help with that? 124 00:07:29,920 --> 00:07:34,030 Well, think about it. Transparency and accessible information are like foundational 125 00:07:34,030 --> 00:07:36,000 for solving big global problems. 126 00:07:36,000 --> 00:07:41,030 The sources point out CCAN actively helps tackle nine of the 17 SDGs from the UN's 127 00:07:41,030 --> 00:07:42,640 2030 agenda. 128 00:07:42,640 --> 00:07:47,540 So, for instance, by centralizing disaster response data that connects to SDG 13 129 00:07:47,540 --> 00:07:48,560 climate action, 130 00:07:48,560 --> 00:07:52,480 it lets NGOs and emergency teams figure out where resources are needed, who's 131 00:07:52,480 --> 00:07:56,080 vulnerable, much faster than digging through scattered reports. 132 00:07:56,080 --> 00:08:00,120 By just enabling efficient, standardized data flow, it directly contributes to 133 00:08:00,120 --> 00:08:01,360 these major global efforts. 134 00:08:01,360 --> 00:08:05,140 And keeping something this powerful, neutral and accessible that must need special 135 00:08:05,140 --> 00:08:07,120 governance, especially being open source, 136 00:08:07,120 --> 00:08:10,600 who makes sure it stays a public good, you know, doesn't get taken over by some 137 00:08:10,600 --> 00:08:11,280 interest. 138 00:08:11,280 --> 00:08:14,870 That's a really important point. That responsibility lies with the Open Knowledge 139 00:08:14,870 --> 00:08:15,640 Foundation. 140 00:08:15,640 --> 00:08:19,640 They're a nonprofit. They essentially hold CCAN's assets in trust. 141 00:08:19,640 --> 00:08:23,690 And having this nonprofit steward is the key protection against that vendor lock-in 142 00:08:23,690 --> 00:08:25,120 we talked about earlier. 143 00:08:25,120 --> 00:08:29,370 It ensures the platform sticks to best practices, keeps things open, and really 144 00:08:29,370 --> 00:08:33,540 safeguards its status as a global public asset for everyone, public or private 145 00:08:33,540 --> 00:08:34,000 users. 146 00:08:34,000 --> 00:08:37,770 OK, that governance piece is vital. So let's bring it back down to the user 147 00:08:37,770 --> 00:08:38,440 experience. 148 00:08:38,440 --> 00:08:42,270 Whether I'm a researcher in Canada using the government portal or maybe an analyst 149 00:08:42,270 --> 00:08:44,920 at an energy company using their internal version, 150 00:08:44,920 --> 00:08:50,240 what tools does CCAN actually give me to make sense of these huge data sets? 151 00:08:50,240 --> 00:08:54,970 Right. So as a full data portal platform, it offers several layers of useful 152 00:08:54,970 --> 00:08:55,880 features. 153 00:08:55,880 --> 00:09:01,640 First, the basics. It catalogs, stores and gives access to data sets efficiently, 154 00:09:01,640 --> 00:09:03,680 but it's way more than just a list of files. 155 00:09:03,680 --> 00:09:07,650 It usually has a pretty rich user friendly front end, you know, the website part 156 00:09:07,650 --> 00:09:09,200 you actually see and click through. 157 00:09:09,200 --> 00:09:14,440 And really importantly for developers or power users, it provides a full API that's 158 00:09:14,440 --> 00:09:16,240 an application programming interface. 159 00:09:16,240 --> 00:09:19,320 And that's for both the data itself and the catalog about the data. 160 00:09:19,320 --> 00:09:23,390 OK, let's clarify that API bit for beginners. If the data portal is like the 161 00:09:23,390 --> 00:09:25,400 library building, what's the API? 162 00:09:25,400 --> 00:09:30,840 Good analogy. If the portal is the library, the API is like the digital librarian 163 00:09:30,840 --> 00:09:33,400 that can talk directly to other computer programs. 164 00:09:33,400 --> 00:09:37,850 It means developers can build tools that automatically talk to the CCAN system so 165 00:09:37,850 --> 00:09:39,360 they can automate data updates, 166 00:09:39,360 --> 00:09:43,420 pull data into other dashboards or apps, let different software systems query the 167 00:09:43,420 --> 00:09:46,000 catalog without a human needing to click around. 168 00:09:46,000 --> 00:09:49,410 That's how you get those really sophisticated applications that use real-time 169 00:09:49,410 --> 00:09:52,160 government data or internal corporate metrics. 170 00:09:52,160 --> 00:09:55,480 Which is super important for big organizations integrating things. 171 00:09:55,480 --> 00:10:00,320 Exactly. And CCAN often includes visualization tools right in the box. 172 00:10:00,320 --> 00:10:04,270 This lets users get a quick visual sense of the data charts, maps, that kind of 173 00:10:04,270 --> 00:10:06,920 thing, without needing to download massive raw files first. 174 00:10:06,920 --> 00:10:08,480 It just helps speed up understanding. 175 00:10:08,480 --> 00:10:11,920 And for anyone listening who's maybe intrigued by all this, wants to learn more or 176 00:10:11,920 --> 00:10:14,440 even get involved, the community seems really open. 177 00:10:14,440 --> 00:10:16,640 Our source has mentioned lots of ways in. 178 00:10:16,640 --> 00:10:21,230 Free webinars, these CCAN monthly live meetups you can join, mailing lists like 179 00:10:21,230 --> 00:10:25,000 sick and dev, chat channels on Gitter, using GitHub issues for help. 180 00:10:25,000 --> 00:10:28,680 It really sounds like a living ecosystem. 181 00:10:28,680 --> 00:10:31,480 OK, so let's try and wrap this up. Key takeaways. 182 00:10:31,480 --> 00:10:34,780 We've learned CCAN, which is looked after by the nonprofit Open Knowledge 183 00:10:34,780 --> 00:10:39,040 Foundation, is basically the world's top open source platform for data portals. 184 00:10:39,040 --> 00:10:42,710 It takes massive amounts of data, government, science, even private company data, 185 00:10:42,710 --> 00:10:46,360 and makes it accessible, standardized, like turning Australian public info or 186 00:10:46,360 --> 00:10:48,880 internal finance data into usable resources. 187 00:10:48,880 --> 00:10:53,480 And crucially, it's actively helping meet major UN sustainable development goals 188 00:10:53,480 --> 00:10:56,360 just by improving transparency and how data flows. 189 00:10:56,360 --> 00:10:59,510 Yeah, and that leads to, I think, a really interesting question for you to mull 190 00:10:59,510 --> 00:10:59,960 over. 191 00:10:59,960 --> 00:11:05,380 Given that we know it helps achieve these global SDGs and it's so fundamental to 192 00:11:05,380 --> 00:11:11,490 government transparency, how might relying more on robust open source systems like 193 00:11:11,490 --> 00:11:13,280 CCAN actually change 194 00:11:13,280 --> 00:11:17,850 how transparent and effective global development projects or even just governance 195 00:11:17,850 --> 00:11:20,560 itself become in the next, say, 10 years? 196 00:11:20,560 --> 00:11:25,030 It suggests a future built not just on having data, but on truly shared, accessible 197 00:11:25,030 --> 00:11:25,880 knowledge. 198 00:11:25,880 --> 00:11:27,960 That's a powerful thought to end on. 199 00:11:27,960 --> 00:11:31,240 And once again, we really want to thank Safe Server for supporting this Deep Dive. 200 00:11:31,240 --> 00:11:34,600 Safe Server is there for your digital transformation and hosting needs, making sure 201 00:11:34,600 --> 00:11:38,120 essential platforms like CCAN can run reliably, securely. 202 00:11:38,120 --> 00:11:41,320 You can find out more at www.safeserver.de. 203 00:11:41,320 --> 00:11:43,040 That's all the time we have for this Deep Dive. 204 00:11:43,040 --> 00:11:45,040 Go forth, be informed, and we'll catch you on the next one.