Welcome to the deep dive your shortcut to getting up to speed quickly today. Our
mission is really focused
We're tackling one of the biggest headaches in digital security keeping web apps
and API safe
We are diving deep into the sources around open app sec
It's a machine learning engine and the promise here is pretty big fundamentally
changing web defense moving security away from you know
Reactive fixes towards automatic preemptive protection against the big stuff like
the OWASP top 10 and yeah
Even zero days now before we jump right in just a quick word this deep dive is
supported by safe server
They handle software hosting and can support you through your digital
transformation. You can find out more about them at
www.safeserver.de
Okay, let's get into it if you're in tech
You probably know the the pain of the traditional web application firewall the way
up that sits in front of your app in historically
Managing it it's a constant battle hours writing signatures exceptions dealing with
false positives. Yeah, that's exactly the context
The source material really positions open app sec as well the opposite of that high-maintenance
reality
They define it pretty simply a machine learning security engine one that
automatically and crucially preemptively
Stops threats against web apps and apis
The core idea isn't just catching known bad stuff
It's about learning what normal looks like for your specific application and that's
the bit that really grabs your attention, right?
Painless to configure and manage effective security and it's open source honestly
knowing the usual operational grind that almost sounds well
Too good to be true. Well, the claim really rests on two core ideas. They hammer
home
preemptive and precise
Preemptive means it acts without needing those constant signature updates. This is
where those pretty startling claims come in
They state it blocked major zero days like log4 shell and spring4 shell with no
prior knowledge. No updates needed
That's well, that's a huge deal. Wow. Okay, if that holds water, it means the
system isn't just matching patterns
It's understanding the attacks intent somehow precisely. Yeah, and that links
directly to the second idea
precision
Because the machine learning is continuously adapting to your environment. It
supposedly cuts down
Drastically on the noise, you know the false positives the endless exceptions that
make old-school wafts such a chore
Okay. So before we get into the ML engine itself the sort of brain of the operation
what happens first when a request comes in?
What's the groundwork? Right? It needs to fully understand the data first which can
be tricky
So for every single HTTP request the engine starts by decoding everything all the
parts of the payload
It pulls out any nested data specifically looking for JSON or XML sections hidden
inside
Only after it's fully parsed and understood the raw request and applied basic IP
checks
Then the machine learning part kicks in. Got it. So let's unpack that core
intelligence
The sources talk about a two-phase dual engine process not just one big AI model.
How does it figure out good versus bad?
Yeah, it's clever. It combines
Global knowledge with very specific local context. The main goal is spotting
requests that just don't fit the normal pattern
Things that fall outside how users should be interacting with the application. Okay
phase one
That's the supervised model the global schooling part. That's a great way to think
about it
Yeah, the supervised model is like the global expert
It's trained offline on a massive data set millions of requests both malicious
attack traffic and perfectly normal benign traffic
collected from all over
Think of it as the system's baseline understanding of known attack types seen
across the Internet
What kind of data does it look at it considers a whole range of things?
Known attack indicators the IP's reputation user agents browser fingerprints other
sort of contextual clues
It does a quick comparison does this incoming request look like any known global
attack patterns
The source has mentioned there's a basic model for say testing or monitoring only
but the advanced model
The one they recommend for production gets updated via their portal. It keeps
learning globally
Okay, so the supervised model acts as a first filter
Let's the obviously good stuff through flags that clearly bad based on global
patterns. But what about the gray area stuff?
That looks suspicious globally, but might be okay locally. That's where phase two
comes in. Exactly
That's the handoff if phase one says hmm
This looks risky or suspicious then the analysis moves to the unsupervised model
and this one is completely different
It's the local detective. It doesn't use that global training data instead
It builds its intelligence in real time right there in your protected environment.
Ah, okay
So it's learning the specific quirks and patterns of my application my users
I see it looks at hyper local context things like the exact URL being hit the usual
traffic patterns for that specific endpoint
Maybe the history of the user involved it uses this local perspective to generate a
final confidence score and that score decides
Block or allow it's that blend the global knowledge of attacks plus the specific
understanding of your applications behavior
That's supposed to deliver that promised precision interesting
So moving beyond just the core ML engine the sources indicate this isn't just a
simple filter even with the two phases
What other security layers are built in? It sounds like a sweep. It really is
presented as a comprehensive stack
Yeah bundled into one engine a big one for modern setups is API security
The engine apparently discovers your API is automatically and helps narrow the
attack surface
It can enforce strict open API schema validation making sure the API calls actually
look like they're supposed to
Exactly making sure the traffic fits the expected format and what about protection
against, you know standard known vulnerabilities
We still need that right? Absolutely, and that's integrated too. There's an
enhanced intrusion prevention system and IPS this protects against
I think the number was over 2,800
specific web CVEs known vulnerabilities
Uses NSS certified tech and an open snort 3.0 engine. So you get this smart ML
detection and strong defense against known exploits
Best of both worlds supposedly. Okay makes sense. What else bots are a huge problem.
Yep covered. They include anti-bot capabilities
designed to identify and stop automated attack scrapers
Intrusion attempts before they cause damage right that handles incoming traffic
threats
But what about uploads malicious files are a classic vector. Good point. They've
built in file security
It scans uploaded files automatically and it checks them against a big cloud
repository for known malicious file reputations
Helps stop nasty executables or scripts getting under your servers. Mmm. I was I
mentioned of some pretty
Advanced rate limiting controls more than just blocking IPs. Yeah, the rate
limiting seems quite flexible
You can set limits based not just on IP address
But on identifiers inside the session like, you know specific keys found in GWT's
JSON web tokens or maybe values and cookies or custom headers
That's really useful for protecting individual user accounts or API keys from brute
force or abuse. That is more granular
Definitely and one more interesting feature crowd wisdom. They partner with crowd
sec
This means the system gets real-time threat intelligence from I think it's over 64,000
contributing servers
So if an IP address starts attacking applications elsewhere on the network
Your system can learn about it almost instantly and block it proactively kind of a
neighborhood watch for servers
Okay, let's circle back to those zero-day claims because that's really the headline
grabber blocking log4 shell without a signature
How do the sources justify that confidence? It still sounds almost magical the
justification comes back to that unsupervised model the anomaly detection part
Because it learns normal so well anything drastically outside that norm gets flagged
Even if the specific intact method has never been seen before
The sources specifically list log4 shell spring4 shell also text4 show that Apache
text vulnerability CVE
2022 port 2 8 8 9 and they even mentioned a tricky wife bypass technique using JSON
syntax hidden in SQL injection payloads
In each case the argument is the system saw behavior or request structures that
were just weird
Highly anomalous compared to the learned baseline for that application. So it got
blocked regardless of the specific attack signature, right?
It's not matching a known bad pattern. It's spotting not normal. Okay, so thinking
about implementation if I'm running modern infrastructure cloud
Containers CICD pipelines. How does this fit in if the goal is painless it can
involve tearing everything down?
No, it seems designed explicitly for that. It's described as cloud native and CICD
friendly
You can deploy it using declarative methods like infrastructure as code or manage
it via API's
Platform wise it usually works as an add-on
You can deploy it with Linux Docker Kubernetes setups and it integrates with common
reverse proxies NG INX Kong API six
envoy
The usual suspects and management flexibility different teams have different
preferences seems like it
You can use declarative config files Kubernetes helm charts or annotations for that
automated k8s workflow
Or they offer a SAS web interface for managing it visually so options depending on
your operational style
Good good. And finally that open source aspect. That's pretty important for trust,
especially with security tools. Absolutely crucial
It's under the Apache 2.0 license. The core engine code is available
What's interesting architecturally is they seem to have separated the main logic
Which is mostly C++ for the bits that connect to the web server
Attachment and C and the part that sinks learning data between agents smart sync
and go
that modularity helps with auditing and critically the sources highlight an
independent third-party security audit was done on the code back in
2022 that definitely helps build confidence. Okay, this has been a really
insightful dive
We've traced the shift haven't we from those brittle signature based rules needing
constant tweaking that high maintenance of draft job
Towards something self-learning adaptive focused on preemptive defense
The big takeaway here seems to be about changing the operational overhead of
application security
Moving away from endless manual tuning. Let's security teams. Maybe focus on higher-level
strategy
It really could shift the focus and that leads to a really interesting question for
you the listener to think about
If this kind of technology truly becomes an install and forget solution one that
learns continuously and defends preemptively
What does that mean for the future role of the security analyst?
Particularly the ones whose main job today is that constant monitoring and fine-tuning
of complex security policies
That's the potential strategic shift to consider in your own planning a really
provocative thought to end on think about how that might change things
In your own environment or career. Thank you for joining us for this deep dive
And thanks once again to our sponsor safe server for making this exploration
possible
Remember to check out how they can support your software hosting and digital
transformation at
Join us next time on the deep dive
Join us next time on the deep dive
