Skip to main content

 

Splunk Lantern

Monitoring Snowflake database usage

​​Snowflake is a popular, managed database that can be used to quickly access private data as well as an extensive collection of publicly available data sources. You need to be able to monitor and troubleshoot data sources and their performance, and answer questions such as:

  • Are queries getting blocked or queued?
  • Should the size of a Snowflake Warehouse be upgraded?
  • Have there been an unusual number of failed logins?
  • Why isn’t the service returning data? Has a query changed and started erroring out?

Data required

Database data

How to use Splunk software for this use case

By using Snowflake dashboards and detectors in the Splunk Observability Cloud content contributors repository you can get answers to these questions, alert on them, and chart trends of other important usage or performance metrics. Download the files from Github, and then use the configuration examples to help you get set up.

You can use the repository to access:

Dashboards

The Snowflake Home Dashboard

The Snowflake Home Dashboard provides you with aggregate overviews of important Snowflake usage metrics like number of warehouses, databases, or users, along with quick summaries of daily cost and total storage usage. This allows you to quickly understand what is going on over your entire Snowflake footprint. In this dashboard the charts shown generally show relevant metrics split by higher-value time periods, such as 24 hours or one hour, to provide you with a high-level understanding that can serve as the starting point for more detailed investigation.

clipboard_e81b63d2ecbb46e8830b03ebdca0cc069.png

The Warehouse, Database, and Schema dashboards

For more detailed information, you can access other dashboards focused on warehouses, databases, and schemas. These allow you to focus on more specific detail with trends over time to help you identify changes in queries, errors, spilling, queuing, and data ingest.

In the example below, the Snowflake Warehouse dashboard shows queries by warehouses, active warehouses, and errors, including queries by warehouse size. You can use this information to see if you have any small warehouses that are getting too many queries or large warehouses that are getting too few queries, at which point you can make adjustments accordingly.

clipboard_ecf239730a452f77ee5623902dc92e2b7.png

The Queries dashboard

To get even more detail, you can access the Queries dashboard. This shows you how many queries have been run or executed, and queries that are blocked, provisioned, or overloading. The last table on this page shows an error breakdown by database and user, and it also includes the error message and the query type. This can be useful in a few scenarios, for example:

  • When you want to find out where the most errors are happening
  • If you see a storm of errors and want to find out who or what might be causing it

clipboard_e5893da6edbaf9f169eaf85635667d86a.png

The Cost dashboard

Snowflake credits are cyclically counted for the current day, and this dashboard helps you see how much you have used. You can see the daily credits used for the day and what they're being used on. There is also a table that shows the credit breakdown by service type and where those credits are being spent.

clipboard_e6a3d2d66eeb0eb377f8c5e828d2010f1.png

The Security dashboard

This dashboard shows total logins in your environment, failed logins, and sessions split by user. This can help you identify any divergent trends that may need immediate attention.

clipboard_e97360eb5d8e8bc7201800ba49999738d.png

Detectors

The Golden Signals are helpful metrics to track for various software services. The Detectors for Snowflake provided in the Observability Content Contributor repository follow the Golden Signals pattern of looking at Latency, Errors, Traffic, and Saturation (L.E.T.S.) along with a couple of billing related detectors. These detectors are a helpful head start and can have their thresholds easily adjusted to match expectations in your own environment.

Latency

  • Long queries in small / extra-small warehouses (usually a sign you may want to upgrade your Warehouse size)
  • Queries taking longer than 15 minutes

Errors

  • DB errors total
  • DB error rate
  • Login failure rate by user

Traffic

  • Blocked queries by warehouse
  • No queries seen in last X hours

Saturation

  • Overloaded queries
  • Queries queued for over X seconds

Billing

  • Credits used by warehouse anomaly detection
  • Warehouses with high cloud services cost

Using these detectors as examples will help you identify what issues might be occurring during an incident, and whether they are related to Snowflake or other elements of your stack.

OpenTelemetry Agent Configuration

Snowflake Receiver (OpenTelemetry Contrib)

Set up according to receiver documentation or use the provided configurations as guidance:
  1. agent_config.yaml Contains receiver, exporter, pipeline configuration
    • The receiver entries for Snowflake can be found under snowflake
  2. splunk-otel-collector.conf Contains referenced variables like snowflake username/password, Splunk Observability Cloud token, etc.

Smart Agent Receiver (Legacy)

  1. agent_config.yaml Contains receiver, exporter, pipeline configuration
    • The receiver entries for Snowflake can be found under smartagent/sql
  2. splunk-otel-collector.conf Contains referenced variables like snowflake username/password, Splunk Observability Cloud token, etc.
  3. snowflake-metrics.yaml Contains SQL queries and mappings for Splunk Observability Cloud metrics and dimensions pulling metrics from the Snowflake SNOWFLAKE/ACCOUNT_USAGE internal view

Next steps

These resources might help you understand and implement this guidance:

  • Splunk OnDemand Services: Use these credit-based services for direct access to Splunk technical consultants with a variety of technical services from a pre-defined catalog. Most customers have OnDemand Services per their Success Plan. Engage the ODS team at ondemand@cisco.com if you would like assistance.