SLA, SLO, and other timelines have been always a bit of a misconception in security and development teams? We are currently working on a whitepaper on vulnerabilities, methodologies and SLA/SLO (you can find more information here)
In this article, we explore An overview of the various timelines that could influence the SLA and what they are.
Let’s start with definitions of SLA and SLO and what they are:
- A service-level agreement is a commitment between a service provider and a client. In our specific case, SLAs are the number of days a specific vulnerability must be fixed.
- A service-level objective is a critical element of a service-level agreement between a service provider and a customer – Similar to SLA, it is not an agreement but rather an objective.
- SLI – We will not cover those here as they can be misleading for the article.
- OKr – Objectives to achieve (specifically in the DevOps teams) if that OKr is provided later. A quick example of an objective for the team could be the number of vulnerabilities resolved per sprint or a balance between user stories and security/bug fixes.
|Definition||A quantifiable measure of reliability||A target reliability level objective||A legal contract or agreement that, if breached, will have penalties|
|Example||The number of vulnerabilities should be < 10 for every release||Critical Vulnerabilities will be resolved in 28 days 95% of the time||Public available products will have 0 critical vulnerability upon critical release vulnerability disclosed will be solved in 10 days|
|Who Sets it||Security teams in collaboration with Product Owners||Product Owner in partnership with security teams||Business Development, Legal teams, IT and Devsecops|
Timeline to fx vulnerabilities are dictated by several events and are composed, in reality, of a number of timelines. We start from the official public timeline (bottom) that determines the public or private disclosure of a vulnerability till the time of the release of a patch/bugfix.
At any point in this evolution, your system can detect the vulnerability.
Normally this happens when tooling releases a vulnerability discovered. Zer day is the period of time that spans between the vulnerability being released and the patch/fix released by the vendor. (second timeframe)
Usually when a vulnerability is disclosed in public security scanners vendors tend to release the vulnerability detection within days to enable organization’s detect vulnerabilities.
The exposure window is usually the time from the release of the vulnerability to the time of resolution in your system. Nonetheless in reality the timers for exposure windows start from the time the vulnerability gets identified in your system to the time the vulnerability gets resolved.
SLA or SLO usually are the target times from the vulnerability being discovered in the system or the ticket being raised with the individual team (resolution SLA).
When the vulnerability gets marked as a false positive
Discovery to Declaration to CVE – This timeline is usually the most dangerous and relates to the discovery of vulnerability – commonly in this timeline there is no patch available, and the systems are at risk for the so-called 0 days.
- Disclosure in the wild of vulnerability usually involves the vulnerability being disclosed widely on the web for various reasons, giving the vendor no chance to fix the vulnerability. The resolution time/mitigation time becomes critical.
- CVE Registration – The CVE register acknowledges the vulnerability, and the vulnerability does receive a specific code.
- PoC – Proof Of Concepts made available – Usually, the PoC is a piece of code that exploits vulnerabilities in systems.
- Vulnerability identified in network/container/code.
- The vulnerability being worked on by a team – Not all the time a vulnerability/ patch is straightforward to fix. Some of the time an update is quite straightforward and requires only a few updates, whereas other times it requires extensive testing and careful planning.
- The vulnerability is being remediated by the team.
- Vulnerability remedy being confirmed (pentest, Security scanner).
SLA, SLO and Vulnerability Timelines
SLA, and SLO definitions
SLA/SLO based on severity lacks the context elements (importance, criticality of asset/data), while the SLA/SLO based on risk is more precise but it could vary over time depending on the variation of threat intelligence, exposure etc…
- Based on the Severity of vulnerabilities – does not account for context and is fixed
- Based on Risk – account for the criticality of assets and varies over time
Ultimately its up to you which SLA you would like to use and it matters in the context of an agreement with the development teams.
Vulnerability Related SLA:
- Discovery SLA = This SLA provides the agreed time on how long a team should fix the vulnerability from the time of discovery (in the system) to the time of resolution.
- Resolution/Acknowledgment SLA = This SLA provides the agreed time on how long a team should aim to fix a vulnerability. Usually, the clock starts when the ticket gets acknowledged or after triaging it.
Risk Related SLA
- Risk Triage SLA = This SLA provides the agreed time on how long it should take to triage a risk and accept/reject it.
- Risk SLA = This SLA provides the agreed time on how long the risk should be in the risk status – accepted, signed off (Maximum Risk time)
Timers and Statistical Indicators
There are a number of indicators used to measure performance and average resolution times for SLA, SLO
MTTR (mean time to resolve) is the average time it takes to fully resolve a failure. This includes not only the time spent detecting the failure, diagnosing the problem, and repairing the issue
MTTA (mean time to acknowledge) is the average time it takes from when an alert is triggered to when work begins on the issue. This metric is useful for tracking your team’s responsiveness and your alert system’s effectiveness.
Some notes on the above SLA/SLO
- The Discovery SLA is controversial as it does not calculate exactly the time when a ticket was raised with the team that needs to solve it but gives a good idea of the age of a vulnerability in the organization
- MTTR resolution times need to account also the business downtime unless you have teams that follow the sun and can work on resolution around the clock
- MTTR and other resolution considerations should account for release cycles. When a vulnerability and story are resolved might not be detected by the scanner immediately, so there should be compensation for this buffer in the calculation
The key factor to address in an organization is work between security teams and the development team. The key to resolution is that the team responsible to fix vulnerabilities works on the vulnerabilities. SLA, SLO, and SLI are purely a form of agreement between security and development and should use as a guidance factor to create objectives for OKR and other security business requirements for each team.