Skip to main content
Splunk Lantern

Monitoring Snowflake database usage

 

​​Snowflake is a popular, managed database that can be used to quickly access private data as well as an extensive collection of publicly available data sources. You need to be able to monitor and troubleshoot data sources and their performance, and answer questions such as:

  • Are queries getting blocked or queued?
  • Should the size of a Snowflake Warehouse be upgraded?
  • Have there been an unusual number of failed logins?
  • Why isn’t the service returning data? Has a query changed and started erroring out?

Data required

Database data

How to use Splunk software for this use case

By using Snowflake dashboards and detectors in the Splunk Observability Cloud content contributors repository you can get answers to these questions, alert on them, and chart trends of other important usage or performance metrics. Download the files from Github, and then use the configuration examples to help you get set up.

You can use the repository to access:

Dashboards

The Snowflake Home Dashboard

The Snowflake Home Dashboard provides you with aggregate overviews of important Snowflake usage metrics like number of warehouses, databases, or users, along with quick summaries of daily cost and total storage usage. This allows you to quickly understand what is going on over your entire Snowflake footprint. In this dashboard the charts shown generally show relevant metrics split by higher-value time periods, such as 24 hours or one hour, to provide you with a high-level understanding that can serve as the starting point for more detailed investigation.

clipboard_e81b63d2ecbb46e8830b03ebdca0cc069.png

The Warehouse, Database, and Schema dashboards

For more detailed information, you can access other dashboards focused on warehouses, databases, and schemas. These allow you to focus on more specific detail with trends over time to help you identify changes in queries, errors, spilling, queuing, and data ingest. 

In the example below, the Snowflake Warehouse dashboard shows queries by warehouses, active warehouses, and errors, including queries by warehouse size. You can use this information to see if you have any small warehouses that are getting too many queries or large warehouses that are getting too few queries, at which point you can make adjustments accordingly.

clipboard_ecf239730a452f77ee5623902dc92e2b7.png

The Queries dashboard

To get even more detail, you can access the Queries dashboard. This shows you how many queries have been run or executed, and queries that are blocked, provisioned, or overloading. The last table on this page shows an error breakdown by database and user, and it also includes the error message and the query type. This can be useful in a few scenarios, for example:

  • When you want to find out where the most errors are happening
  • If you see a storm of errors and want to find out who or what might be causing it

clipboard_e5893da6edbaf9f169eaf85635667d86a.png

The Cost dashboard

Snowflake credits are cyclically counted for the current day, and this dashboard helps you see how much you have used. You can see the daily credits used for the day and what they're being used on. There is also a table that shows the credit breakdown by service type and where those credits are being spent.

clipboard_e6a3d2d66eeb0eb377f8c5e828d2010f1.png

The Security dashboard

This dashboard shows total logins in your environment, failed logins, and sessions split by user. This can help you identify any divergent trends that may need immediate attention.

clipboard_e97360eb5d8e8bc7201800ba49999738d.png

Detectors

The Golden Signals are helpful metrics to track for various software services. The Detectors for Snowflake provided in the Observability Content Contributor repository follow the Golden Signals pattern of looking at Latency, Errors, Traffic, and Saturation (L.E.T.S.) along with a couple of billing related detectors. These detectors are a helpful head start and can have their thresholds easily adjusted to match expectations in your own environment.

Latency

  • Long queries in small / extra-small warehouses (usually a sign you may want to upgrade your Warehouse size)
  • Queries taking longer than 15 minutes 

Errors

  • DB errors total 
  • DB error rate
  • Login failure rate by user

Traffic

  • Blocked queries by warehouse
  • No queries seen in last X hours

Saturation

  • Overloaded queries
  • Queries queued for over X seconds

Billing 

  • Credits used by warehouse anomaly detection
  • Warehouses with high cloud services cost

Using these detectors as examples will help you identify what issues might be occurring during an incident, and whether they are related to Snowflake or other elements of your stack.

Configuration examples

These examples expect you are using the splunk-otel-collector, but these examples also work with any other OTEL configuration. These are only examples. Your configuration will likely be slightly different.

  1. Agent_config.yaml. Contains receiver, exporter, pipeline configuration. The receiver entries for Snowflake can be found under smartagent/sql
    • You must add your Snowflake account to this config where account is taken from this format: <account>.snowflakecomputing.com
    • If you plan to use a custom role rather than ACCOUNTADMIN you will need to add your role to the config.  
    • Resolution of 3600 seconds (1 hour) is recommended due to the latency between actions happening and showing up in the SNOWFLAKE/ACCOUNT_USAGE DB view. It is possible to collect at a higher interval, but it is not recommended.
  2. Splunk-otel-collector.conf. Contains referenced variables like snowflake username / password, and Splunk Observability token, etc.
    • Add your Splunk Observability token in SPLUNK_ACCESS_TOKEN
    • Add your Snowflake User to SNOWFLAKE_USER (the user must have a role that allows access to the SNOWFLAKE/ACCOUNT_USAGE db view).
    • Add the password for your Snowflake user account to SNOWFLAKE_PASS
  3. Snowflake-metrics.yaml. Contains SQL queries and mappings for our Splunk Observability metrics and dimensions.
    • The snowflake-other-metrics.yaml file contains SQL queries for detailed and high cardinality DB query metrics, including the query_id dimension which is a GUID.
    • When using these metrics, replace the DB Metrics in snowflake-metrics.yaml.
    • Billing usage is shown in US dollars.

Next steps

These resources might help you understand and implement this guidance: