Establishing service levels

Splunk Success Framework: Program Management step 4 of 8

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

When you implement Splunk software to support services, you might be asked to establish service-level objectives (SLOs), service-level agreements (SLAs), case priority levels, or incident response times to help track and manage service availability.

Service-level definitions provide all teams and organizations assurance that Splunk operations and response models meet their needs without impacting other areas.

You can use our guidelines and templates to create service levels for your organization. Use the examples provided or use them as a baseline for establishing your own.

Guidelines for implementing service level definitions

There are many factors to consider when making a service-level commitment.

Don't over-commit.

Consider what might be too fast for delivery.
Don't make a commitment that's unreasonable or unfair to your team members.

Don't under-commit.

The requester should never feel that the response time is unreasonably long.
Be as accommodating as possible when setting goals for turnaround

Consider the time it takes to gather the necessary information.

Many types of requests require follow-up with the requester. Recognize that there may be a waiting period for this additional information.
Have reasonable expectations when requesting additional information. Make sure to communicate your expectations to the requester.
The requester should be aware that a slow response time will have an impact on expected turnaround time.

Think about the process for incoming requests.

Think about your engagement model for incoming requests. Optimize the request process so teams can work together effectively.
Create a process that is straightforward and effective.

SLO templates

SLOs provide expectations for maintenance planning, release planning, and communication with business partners. You can divide SLOs into administrative tasks (day-to-day activities) and implementation tasks.

The following templates provide suggestions. You can adapt them to commitments that work for your organization.

Administrative SLOs (day-to-day activities)	Target
Delete new user	5 business days
Add new user	5 business days
Elevate user permissions	25 business days
Dashboard creation support	10 business days
Report generation support	5 business days
Alert creation and changes support	5 business days
Create a new Active Directory group for access (External dependencies)	25 business days
Create new role	15 business days

Implementation SLOs	Target
First response to new support request	1 business day
Data ingest (standard add-on)	5 business days
Data ingest (custom add-on)	10 business days
New app install	1 business day
Universal forwarder deployment (Does not include change control SLO)	10 business days
Data source monitor (http, WMI, TCP/UDP)	2 business days
Implement new global knowledge object	1 business day
Upload data into Splunk (for example, static log, file, CSV)	5 business days

SLA templates

SLAs are key service definitions for platform availability and incident response.

The following templates provide suggestions. You can adapt them to commitments that work for your organization.

SLAs	Target
Platform availability	99.9% uptime for all core services (< 8.76 hours unplanned downtime per year)
Incident first response	Based on priority
Incident status update	Based on priority
Restore loss of data feed (ingestion)	1 business day
Restore universal forwarder not reporting (standard)	5 business days
Restore universal forwarder not reporting (mission critical applications)	1 business day

Case priority levels

Case priorities can vary by service or source.

The following templates provide suggestions. You can adapt them to commitments that work for your organization.

Case priority level	Definition
P1	A mission critical outage for which there is no workaround. This may be a complete service outage of a core service.
P2	A mission critical outage for which a less than ideal workaround exists. This may be a partial service outage of a core service.
P3	An outage or issue impacting a single user.
P4	Standard service requests or routine changes. For example, access requests, data onboarding, app installation, etc.

Incident response times

Incident response	P1	P2	P3	P4
First response	1 hour	2 hours	4 hours	1 business day
Communicated updates	Every 2 hours	Every 4 hours	Every business day	Every 5 business days
Resolution time	Within 4 hours	Within 2 business days	Within 3 business days	Agreement with the customer
Business hours	24 hours / 7 days per week	8:00am to 5:00pm / 5 days per week excluding holidays	8:00am to 5:00pm / 5 days per week excluding holidays	8:00am to 5:00pm / 5 days per week excluding holidays

Key terms

Mission critical. An outage impacting revenue, ability to hit agreed SLA/OLA, or a noted mission-critical data source or app.
Core service. Indexing, Searching, or Alerting.
Routine change. A low-impact, low-risk change not requiring a change review.
Emergency change. A change required to resolve a P1/P2 condition.
Service request. A request to add new capacity of lower complexity than a project, for example, new inputs or new add-on installs.

More resources

For more information on Splunk Cloud Platform SLAs, see Splunk Cloud Service Level Agreements.

Previous step

Next step

Back to the SSF homepage