Skip to main content

 

Splunk Lantern

Managing various limits in Splunk Observability Cloud

 

Splunk Observability Cloud has a number of limits related to license usage, data ingestion, API calls and more that you need to be aware of. This article discusses different ways to manage or work around these limits.  

Limit types and management options

Detectors

Detectors notify customers when a certain threshold is reached. The default number of detectors is 1,000. This can be increased by request, but a better option is to redesign your detectors by grouping multiple alert rules into a single detector. For example, you can have different rules for the different environments and severity levels within the same detector. However, make sure that your groupings are logical or they won't be useful. For example, you probably wouldn't want to group memory and CPU utilization into a single detector.

You should also set up the following AutoDetect detectors:

Traces and spans

Use the trace and span statistics charts to understand your usage.

O11y15.png

The API throttling and entitlement dashboards can also help, but if you have dozens of applications using a single token, getting to the root cause of an overage can be difficult. You can open a support ticket to help, but getting an answer could take a few days.

APM license usage

There is no out-of-the-box metric that correlates APM license usage with services. Customers generally use one token for multiple applications, as having one token for each application is too difficult to manage.

Instead, you can open a support ticket asking to enable the sf.org.apm.numTracesReceivedByService metric. This metric:

  • Provides the number of traces sent by service
  • Can let you build chargeback dashboard and detectors
  • Is much more granular than tokens
  • Is easy to manage, doesn't expire, and doesn't require rotation

O11y16.png

MTS ingest

These limits include active MTS and MTS creation rate, which customers pay for, and the license hard limit, which prevents crashing the platform.

Customers sometimes reach these limits when they:

  • Onboard new systems (not too problematic).
  • Update a current system. While you are updating, restarting the clusters generates a new unique ID in the metrics, so each will be counted as two metrics for a period of time.

Some ways to prevent exceeding the MTS ingest limits are to:

  • Enable histograms.
  • Enable die-fast metrics.
  • Buy more license.
  • Stagger releases, manually or with a script. If you only do one release at a time and wait an hour between releases, then you are less likely to run into the duplication issue described in this section.
  • If you are working with Splunk Professional Services, they can open a ticket to temporarily raise the limit. Alternatively, you can open a support ticket with this request.

API calls

You cannot make more than 10 calls per minute, so you should take that limit into account in your scripts so you don't exceed it. Be aware of the following if you choose this method:

  • You will likely have to prevent your laptop from going to sleep to ensure your scripts complete.
  • If you have thousands of queries, you might run into timeouts with this method, so you'll need to handle exceptions in your scripts.
  • Some endpoints have a limit on the number of objects returned. To work around this, you can use filters, like asking first to return all objects that start with A, then all that start with B, etc. You can also write a recursive function.

Next steps

Another limit type you will want to learn to manage in Splunk Observability Cloud is the data() block limit, which you can learn about in this article. In addition, these resources might help you implement the guidance found in this article.