1 00:00:00,000 --> 00:00:04,660 Web analytics, it's one of those things that, for most people with a website today, 2 00:00:04,660 --> 00:00:04,880 feels 3 00:00:04,880 --> 00:00:07,380 like a necessary evil. 4 00:00:07,380 --> 00:00:12,380 The tools are technical, they're often built to track every single click, and it 5 00:00:12,380 --> 00:00:12,860 just feels 6 00:00:12,860 --> 00:00:15,780 like a pretty unfair and impenetrable business. 7 00:00:15,780 --> 00:00:16,780 It does. 8 00:00:16,780 --> 00:00:18,480 And that's really the core problem we're looking at today. 9 00:00:18,480 --> 00:00:23,770 For anyone who is just sick of the information overload on data privacy and trying 10 00:00:23,770 --> 00:00:24,500 to figure 11 00:00:24,500 --> 00:00:28,520 out how to be compliant without losing all your insights, well, often fair web 12 00:00:28,520 --> 00:00:29,080 analytics 13 00:00:29,080 --> 00:00:31,320 is the subject of our deep dive. 14 00:00:31,320 --> 00:00:34,960 It's a really clear, actionable model for doing this ethically. 15 00:00:34,960 --> 00:00:38,680 And that's our mission today, to unpack often for you, especially if you're a 16 00:00:38,680 --> 00:00:39,400 beginner. 17 00:00:39,400 --> 00:00:43,960 We want to look at its key features, the tech behind it, like encryption and self-hosting, 18 00:00:43,960 --> 00:00:46,000 and really get into this mindset shift. 19 00:00:46,000 --> 00:00:47,000 It is a big shift. 20 00:00:47,000 --> 00:00:51,120 Treating the user as an equal party in the conversation, not just a data point. 21 00:00:51,120 --> 00:00:52,120 Right. 22 00:00:52,120 --> 00:00:54,500 The vision here is, well, kind of revolutionary. 23 00:00:54,500 --> 00:00:59,350 It lets you, the operator, get valuable insights, unique sessions, top pages, that 24 00:00:59,350 --> 00:01:00,200 kind of thing. 25 00:01:00,200 --> 00:01:03,360 Well, your users can actually see and control their own data. 26 00:01:03,360 --> 00:01:08,080 And the sources describe it as open, lightweight, self-hosted, and free. 27 00:01:08,080 --> 00:01:10,000 Permanently free, which is a big deal. 28 00:01:10,000 --> 00:01:13,500 And before we jump right into all that, we want to thank Safe Server for supporting 29 00:01:13,500 --> 00:01:13,800 this 30 00:01:13,800 --> 00:01:15,240 deep dive. 31 00:01:15,240 --> 00:01:20,320 Safe Server focuses on hosting software and supporting your digital transformation. 32 00:01:20,320 --> 00:01:25,790 So for tools just like this one ethical self-hosted tools, they provide the 33 00:01:25,790 --> 00:01:26,780 infrastructure. 34 00:01:26,780 --> 00:01:31,040 To learn more, just head over to www.safeserver.de. 35 00:01:31,040 --> 00:01:34,040 That's www.safeserver.de. 36 00:01:34,040 --> 00:01:38,800 Okay, so let's start with section one, the core philosophy of fair analytics. 37 00:01:38,800 --> 00:01:42,200 For a beginner, what does fair actually mean here? 38 00:01:42,200 --> 00:01:44,080 Often really stands on three pillars. 39 00:01:44,080 --> 00:01:46,160 The first is that it's secure and free. 40 00:01:46,160 --> 00:01:49,520 The code is open source, so anyone can check it, and they promise it will always be 41 00:01:49,520 --> 00:01:49,720 free 42 00:01:49,720 --> 00:01:50,720 to use. 43 00:01:50,720 --> 00:01:53,400 And that leads right into the second pillar, which is self-hosted. 44 00:01:53,400 --> 00:01:57,200 I think this is where it gets really interesting for anyone dealing with, say, GDPR. 45 00:01:57,200 --> 00:01:58,200 Absolutely. 46 00:01:58,200 --> 00:01:59,200 Self-hosting is fundamental. 47 00:01:59,200 --> 00:02:00,840 I mean, you own the data. 48 00:02:00,840 --> 00:02:02,520 You control the whole relationship. 49 00:02:02,520 --> 00:02:06,210 So let's pause on that, because self-hosting is really the key to transparency here, 50 00:02:06,210 --> 00:02:06,560 isn't 51 00:02:06,560 --> 00:02:07,560 it? 52 00:02:07,560 --> 00:02:08,900 What does it actually stop from happening? 53 00:02:08,900 --> 00:02:10,920 It stops third-party data leakage. 54 00:02:10,920 --> 00:02:12,920 It stops data brokerage. 55 00:02:12,920 --> 00:02:19,000 By self-hosting, you're guaranteeing zero ads, zero outside data companies, and the 56 00:02:19,000 --> 00:02:19,480 system 57 00:02:19,480 --> 00:02:22,320 exclusively uses something called first-party cookies. 58 00:02:22,320 --> 00:02:23,320 Right. 59 00:02:23,320 --> 00:02:24,320 We hear cookies all the time. 60 00:02:24,320 --> 00:02:26,920 Can you just quickly break down the difference? 61 00:02:26,920 --> 00:02:30,080 First-party versus the third-party ones that cause all the privacy headaches? 62 00:02:30,080 --> 00:02:31,080 Yeah, for sure. 63 00:02:31,080 --> 00:02:35,000 A first-party cookie is just set by the website you're on, and it's for basic 64 00:02:35,000 --> 00:02:35,800 functions, like 65 00:02:35,800 --> 00:02:40,400 remembering your login or, in often's case, if you've agreed to analytics. 66 00:02:40,400 --> 00:02:41,400 Okay. 67 00:02:41,400 --> 00:02:45,750 Third-party cookies are set by totally different domains that are embedded on the 68 00:02:45,750 --> 00:02:46,280 site. 69 00:02:46,280 --> 00:02:49,080 Those are the ones ad networks use to follow you around the internet. 70 00:02:49,080 --> 00:02:50,080 I see. 71 00:02:50,080 --> 00:02:54,070 So often, sticking to first-party cookies really limits the tracking just to your 72 00:02:54,070 --> 00:02:54,200 own 73 00:02:54,200 --> 00:02:55,200 site. 74 00:02:55,200 --> 00:02:56,200 Exactly. 75 00:02:56,200 --> 00:02:59,380 And that brings us to the third pillar, which is maybe the most radical, fair and 76 00:02:59,380 --> 00:03:00,020 undrilled, 77 00:03:00,020 --> 00:03:01,020 by choice. 78 00:03:01,020 --> 00:03:02,760 The opt-in requirement. 79 00:03:02,760 --> 00:03:04,320 Strictly opt-in only. 80 00:03:04,320 --> 00:03:09,600 A user has to actively say yes, and if they don't, the sources are really clear on 81 00:03:09,600 --> 00:03:10,200 this. 82 00:03:10,200 --> 00:03:11,800 They will never leave a trace. 83 00:03:11,800 --> 00:03:12,800 Wow. 84 00:03:12,800 --> 00:03:16,760 So no session data, no fingerprinting, not even a tracking script is loaded. 85 00:03:16,760 --> 00:03:20,950 That feels like it would cause a massive drop-off in data compared to, you know, 86 00:03:20,950 --> 00:03:21,680 the opt-out 87 00:03:21,680 --> 00:03:23,520 by default model. 88 00:03:23,520 --> 00:03:25,280 As an operator, this changes my job, right? 89 00:03:25,280 --> 00:03:27,200 I'm not just installing software. 90 00:03:27,200 --> 00:03:30,080 You're entering into a trust contract with the user. 91 00:03:30,080 --> 00:03:31,720 That's a perfect way to put it. 92 00:03:31,720 --> 00:03:34,640 The operator's responsibility really shifts. 93 00:03:34,640 --> 00:03:36,880 The sources list four key tasks. 94 00:03:36,880 --> 00:03:37,880 Okay. 95 00:03:37,880 --> 00:03:38,880 What are they? 96 00:03:38,880 --> 00:03:40,680 First, you have to self-host it and protect that data. 97 00:03:40,680 --> 00:03:43,360 Second, you integrate the code snippet. 98 00:03:43,360 --> 00:03:47,140 Third, and this is the fairness part, you have to make your users aware that they 99 00:03:47,140 --> 00:03:47,400 can 100 00:03:47,400 --> 00:03:48,860 access their data. 101 00:03:48,860 --> 00:03:50,480 So it's a built-in obligation. 102 00:03:50,480 --> 00:03:51,480 And the fourth? 103 00:03:51,480 --> 00:03:54,960 You use the fair, transparent insights you get to actually improve your services. 104 00:03:54,960 --> 00:03:55,960 Yeah. 105 00:03:55,960 --> 00:03:58,360 It becomes this ethical feedback loop. 106 00:03:58,360 --> 00:03:59,360 I like that. 107 00:03:59,360 --> 00:04:00,360 Okay. 108 00:04:00,360 --> 00:04:02,760 Let's move into section two and look under the hood. 109 00:04:02,760 --> 00:04:06,560 How does often keep things secure while still giving you useful stats? 110 00:04:06,560 --> 00:04:08,400 It starts with data minimization, right? 111 00:04:08,400 --> 00:04:09,400 Right. 112 00:04:09,400 --> 00:04:10,400 It's core to the design. 113 00:04:10,400 --> 00:04:14,340 The whole goal is to collect the absolute minimum amount of data you need for 114 00:04:14,340 --> 00:04:15,040 meaningful 115 00:04:15,040 --> 00:04:16,040 stats. 116 00:04:16,040 --> 00:04:18,960 So what are some of the things that specifically avoids collecting? 117 00:04:18,960 --> 00:04:20,800 The source material is very explicit here. 118 00:04:20,800 --> 00:04:24,560 It does not look at or collect IP addresses at all. 119 00:04:24,560 --> 00:04:25,560 Okay. 120 00:04:25,560 --> 00:04:27,280 And it doesn't look at user agent strings. 121 00:04:27,280 --> 00:04:28,280 Which are? 122 00:04:28,280 --> 00:04:31,490 They're little bits of text your browser sends out that can reveal a lot about your 123 00:04:31,490 --> 00:04:31,920 operating 124 00:04:31,920 --> 00:04:34,540 system, your browser version, your device. 125 00:04:34,540 --> 00:04:36,080 It's a way to fingerprint you. 126 00:04:36,080 --> 00:04:37,080 I see. 127 00:04:37,080 --> 00:04:40,500 And ignoring those, it basically eliminates that possibility. 128 00:04:40,500 --> 00:04:41,500 Totally. 129 00:04:41,500 --> 00:04:45,200 But if they're minimizing so much, how do I know the data's secure once it's on my 130 00:04:45,200 --> 00:04:46,800 server? 131 00:04:46,800 --> 00:04:49,920 Doesn't self-hosting just mean I'm the one responsible for a data breach? 132 00:04:49,920 --> 00:04:55,080 That is where the technical brilliance of end-to-end encryption, or E2E, comes in. 133 00:04:55,080 --> 00:04:56,080 Okay. 134 00:04:56,080 --> 00:04:57,080 Think of it like this. 135 00:04:57,080 --> 00:05:02,340 When a user opts in, their browser, the client, encrypts the usage data. 136 00:05:02,340 --> 00:05:05,980 It locks it in a digital safe before it ever leaves their computer. 137 00:05:05,980 --> 00:05:09,420 So the data arrives at my server, but it's in a locked safe. 138 00:05:09,420 --> 00:05:10,800 Do I have the key? 139 00:05:10,800 --> 00:05:11,800 No. 140 00:05:11,800 --> 00:05:13,120 And that is the critical point. 141 00:05:13,120 --> 00:05:16,120 Your server, the thing storing the data, cannot decrypt it. 142 00:05:16,120 --> 00:05:17,960 It has no idea what's inside. 143 00:05:17,960 --> 00:05:19,720 It's just storing scrambled text. 144 00:05:19,720 --> 00:05:20,720 That's incredible. 145 00:05:20,720 --> 00:05:24,060 So it really addresses that fear of an accidental data leak. 146 00:05:24,060 --> 00:05:28,720 If my server gets compromised, the attacker just gets a bunch of useless encrypted 147 00:05:28,720 --> 00:05:29,120 data. 148 00:05:29,120 --> 00:05:30,120 Exactly. 149 00:05:30,120 --> 00:05:31,700 It makes those leaks harmless. 150 00:05:31,700 --> 00:05:37,120 So if we're only collecting this minimal encrypted data, what kind of insights do I 151 00:05:37,120 --> 00:05:37,480 actually 152 00:05:37,480 --> 00:05:38,640 get as an operator? 153 00:05:38,640 --> 00:05:41,200 I still need to know where my traffic is coming from. 154 00:05:41,200 --> 00:05:42,200 Oh, yeah. 155 00:05:42,200 --> 00:05:44,540 You get all the essentials you need to improve your service. 156 00:05:44,540 --> 00:05:49,590 You can still filter your data by URL, location, refer, landing pages, and exit 157 00:05:49,590 --> 00:05:50,260 pages. 158 00:05:50,260 --> 00:05:52,680 And what about things like marketing campaigns? 159 00:05:52,680 --> 00:05:54,920 You can also filter by UTM parameters. 160 00:05:54,920 --> 00:05:56,400 Can you explain those quickly? 161 00:05:56,400 --> 00:05:58,400 They're just tags you add to the end of a URL. 162 00:05:58,400 --> 00:06:03,060 So if you send out a newsletter, a UTM tag can tell you a click came from your 163 00:06:03,060 --> 00:06:03,400 summer 164 00:06:03,400 --> 00:06:07,340 2024 newsletter instead of just a generic source. 165 00:06:07,340 --> 00:06:09,080 It's vital for measuring what's working. 166 00:06:09,080 --> 00:06:10,080 Got it. 167 00:06:10,080 --> 00:06:11,840 And how long does this data stick around? 168 00:06:11,840 --> 00:06:14,160 Is there a risk of it just piling up forever? 169 00:06:14,160 --> 00:06:15,160 Nope. 170 00:06:15,160 --> 00:06:17,320 There's a hard data retention rule built in. 171 00:06:17,320 --> 00:06:21,190 User data is stored for six months, and then it's automatically and permanently 172 00:06:21,190 --> 00:06:21,840 deleted. 173 00:06:21,840 --> 00:06:22,840 Six months. 174 00:06:22,840 --> 00:06:23,840 That's it. 175 00:06:23,840 --> 00:06:24,840 That's it. 176 00:06:24,840 --> 00:06:29,640 Let's jump into what I think is the most compelling part in Section 3. 177 00:06:29,640 --> 00:06:30,640 Transparency in action. 178 00:06:30,640 --> 00:06:32,200 They call this the auditorium. 179 00:06:32,200 --> 00:06:33,200 Right. 180 00:06:33,200 --> 00:06:34,400 And this is what really sets it apart. 181 00:06:34,400 --> 00:06:36,800 This is the user benefit side of the whole thing. 182 00:06:36,800 --> 00:06:41,640 Beyond just opting in or out often gives users real control. 183 00:06:41,640 --> 00:06:44,720 So they can delete their data or opt out at any time? 184 00:06:44,720 --> 00:06:45,720 Any time. 185 00:06:45,720 --> 00:06:49,060 But the truly radical part is that users can review their own data. 186 00:06:49,060 --> 00:06:50,800 They don't just see a toggle switch. 187 00:06:50,800 --> 00:06:54,880 They get to see the actual metrics with explanations of what everything means. 188 00:06:54,880 --> 00:06:58,240 And the best way to understand this is to contrast the two views, right? 189 00:06:58,240 --> 00:07:00,520 What the operator sees versus what the user sees. 190 00:07:00,520 --> 00:07:01,520 Exactly. 191 00:07:01,520 --> 00:07:02,740 So let's start with the big picture. 192 00:07:02,740 --> 00:07:05,080 What does the operator see on their dashboard? 193 00:07:05,080 --> 00:07:08,040 The operator's view is the aggregate data. 194 00:07:08,040 --> 00:07:10,040 All the data summed up across all your pages. 195 00:07:10,040 --> 00:07:11,320 So you'll see your totals. 196 00:07:11,320 --> 00:07:17,080 Like unique users, say 859 and maybe 3,372 unique sessions. 197 00:07:17,080 --> 00:07:19,160 And lists of top pages, things like that. 198 00:07:19,160 --> 00:07:20,160 Yep. 199 00:07:20,160 --> 00:07:22,080 Other URLs across the whole site. 200 00:07:22,080 --> 00:07:23,720 Standard stuff for managing your content. 201 00:07:23,720 --> 00:07:24,720 Okay. 202 00:07:24,720 --> 00:07:26,400 So that's anonymous, high-level data. 203 00:07:26,400 --> 00:07:29,520 Now what if I'm a user and I go into my auditorium, what do I see? 204 00:07:29,520 --> 00:07:32,520 The user's view is totally specific to you. 205 00:07:32,520 --> 00:07:36,240 It only shows data related to your activity since you opted in. 206 00:07:36,240 --> 00:07:40,290 So you might see one unique website tracked, because that's the only one you've 207 00:07:40,290 --> 00:07:41,040 contributed 208 00:07:41,040 --> 00:07:42,040 data to. 209 00:07:42,040 --> 00:07:43,040 And my sessions. 210 00:07:43,040 --> 00:07:47,030 You might see five unique sessions and a very specific list of the top pages that 211 00:07:47,030 --> 00:07:47,800 you visited. 212 00:07:47,800 --> 00:07:51,790 So if I came to your site five times and looked at the about page and the contact 213 00:07:51,790 --> 00:07:52,480 page, I 214 00:07:52,480 --> 00:07:56,160 would log in and see exactly those pages listed as my top pages. 215 00:07:56,160 --> 00:07:57,160 Precisely. 216 00:07:57,160 --> 00:08:01,940 The operator sees hundreds of users in aggregate, but you see your exact footprint. 217 00:08:01,940 --> 00:08:04,960 It just gets rid of that black box feeling of web analytics. 218 00:08:04,960 --> 00:08:06,360 You know exactly what you shared. 219 00:08:06,360 --> 00:08:08,720 That is a profound trust builder. 220 00:08:08,720 --> 00:08:13,600 So for an operator who has sold on this, let's talk about section four, deployment. 221 00:08:13,600 --> 00:08:15,720 How easy is it to actually get started? 222 00:08:15,720 --> 00:08:20,460 They've really tried to make it simple, often is designed to be very lightweight. 223 00:08:20,460 --> 00:08:24,470 Deployment is basically just downloading a single binary file or pulling a Docker 224 00:08:24,470 --> 00:08:25,040 image. 225 00:08:25,040 --> 00:08:26,880 No huge complex installation. 226 00:08:26,880 --> 00:08:27,880 Right. 227 00:08:27,880 --> 00:08:30,080 And what if you don't want to set up a big dedicated database server? 228 00:08:30,080 --> 00:08:31,080 Yeah. 229 00:08:31,080 --> 00:08:32,080 What are the options there? 230 00:08:32,080 --> 00:08:33,200 You can just use Squalite. 231 00:08:33,200 --> 00:08:35,000 Can you explain Squalite simply? 232 00:08:35,000 --> 00:08:36,000 Sure. 233 00:08:36,000 --> 00:08:39,500 It's a file-based database, so instead of a whole separate server, all your data 234 00:08:39,500 --> 00:08:39,800 just 235 00:08:39,800 --> 00:08:42,140 lives in a single file on your web server. 236 00:08:42,140 --> 00:08:45,420 It simplifies the setup a ton, especially for smaller sites. 237 00:08:45,420 --> 00:08:47,160 And it handles security certificates too. 238 00:08:47,160 --> 00:08:48,160 It can, yeah. 239 00:08:48,160 --> 00:08:52,150 It often can automatically install and renew your SSL certificates for you if you 240 00:08:52,150 --> 00:08:52,640 want. 241 00:08:52,640 --> 00:08:53,640 That's great. 242 00:08:53,640 --> 00:08:57,000 What about making it friendly for a global audience? 243 00:08:57,000 --> 00:08:58,240 Localization is pretty robust. 244 00:08:58,240 --> 00:09:03,240 It's available in English, French, German, Portuguese, Spanish, and Vietnamese. 245 00:09:03,240 --> 00:09:07,720 And crucially, the consent banner and the user's auditorium are localized. 246 00:09:07,720 --> 00:09:09,760 Which is key for trust. 247 00:09:09,760 --> 00:09:12,440 And can you make it match your website's design? 248 00:09:12,440 --> 00:09:13,440 You can. 249 00:09:13,440 --> 00:09:16,720 The consent banner is customizable color, shape, basic fonts. 250 00:09:16,720 --> 00:09:19,640 You can make it feel like a part of your site, not some annoying pop-up. 251 00:09:19,640 --> 00:09:22,640 And for anyone listening who wants to just try this out right now, there's a demo 252 00:09:22,640 --> 00:09:23,040 command, 253 00:09:23,040 --> 00:09:24,040 right? 254 00:09:24,040 --> 00:09:25,040 There is. 255 00:09:25,040 --> 00:09:26,040 It's super easy. 256 00:09:26,040 --> 00:09:28,060 You can create a temporary demo environment on your own machine. 257 00:09:28,060 --> 00:09:34,360 Just open your terminal and run this command, curl https.demo.offend.devbash. 258 00:09:34,360 --> 00:09:36,020 And the login for that demo? 259 00:09:36,020 --> 00:09:40,520 The account is demo at offend.dev, and the password is just demo. 260 00:09:40,520 --> 00:09:44,240 That's the perfect way to see that difference between the operator view and the 261 00:09:44,240 --> 00:09:44,760 user view 262 00:09:44,760 --> 00:09:46,640 we were just talking about. 263 00:09:46,640 --> 00:09:50,940 So to sum it all up, often really is a genuine ethical alternative. 264 00:09:50,940 --> 00:09:55,360 It lets operators get the essential stats they need, unique sessions, traffic flow, 265 00:09:55,360 --> 00:09:57,880 without compromising user privacy at all. 266 00:09:57,880 --> 00:10:02,040 That combination of opt-in consent, end-to-end encryption, and limited data 267 00:10:02,040 --> 00:10:03,040 retention really 268 00:10:03,040 --> 00:10:04,040 sets a new standard. 269 00:10:04,040 --> 00:10:05,040 It does. 270 00:10:05,040 --> 00:10:07,940 And it's worth mentioning that the project got support from the NL Net Foundation 271 00:10:07,940 --> 00:10:08,100 as 272 00:10:08,100 --> 00:10:10,480 part of the Next Generation Internet Initiative. 273 00:10:10,480 --> 00:10:13,240 So it's part of a bigger movement for more private web. 274 00:10:13,240 --> 00:10:14,380 It's a fantastic model. 275 00:10:14,380 --> 00:10:17,820 The sources make it clear that often puts the user on totally equal footing with 276 00:10:17,820 --> 00:10:18,040 the 277 00:10:18,040 --> 00:10:19,040 operator. 278 00:10:19,040 --> 00:10:21,300 And that leaves us with a final thought to consider. 279 00:10:21,300 --> 00:10:25,290 If users can use the exact same tools to analyze their own data as the operator 280 00:10:25,290 --> 00:10:26,120 uses for their 281 00:10:26,120 --> 00:10:31,240 aggregate data, does that level of radical transparency change how we value web 282 00:10:31,240 --> 00:10:32,160 services? 283 00:10:32,160 --> 00:10:35,720 Does fairness stop being just a box you have to tick for compliance and become the 284 00:10:35,720 --> 00:10:36,040 main 285 00:10:36,040 --> 00:10:40,400 reason someone wants to engage with your site, something for you to chew on? 286 00:10:40,400 --> 00:10:41,400 That's a great question. 287 00:10:41,400 --> 00:10:45,150 And a quick reminder that this deep dive was supported by SafeServer, which helps 288 00:10:45,150 --> 00:10:45,440 host 289 00:10:45,440 --> 00:10:48,780 software and provides digital transformation support. 290 00:10:48,780 --> 00:10:52,400 They can help you deploy ethical, self-hosted software like often. 291 00:10:52,400 --> 00:10:55,680 You can learn more at www.safeserver.de.