Monitoring Snowflake database usage
Snowflake is a popular, managed database that can be used to quickly access private data as well as an extensive collection of publicly available data sources. You need to be able to monitor and troubleshoot data sources and their performance, and answer questions such as:
- Are queries getting blocked or queued?
- Should the size of a Snowflake Warehouse be upgraded?
- Have there been an unusual number of failed logins?
- Why isn’t the service returning data? Has a query changed and started erroring out?
Data required
How to use Splunk software for this use case
By using Snowflake dashboards and detectors in the Splunk Observability Cloud content contributors repository you can get answers to these questions, alert on them, and chart trends of other important usage or performance metrics. Download the files from Github, and then use the configuration examples to help you get set up.
You can use the repository to access:
- 7 Dashboards
- 11 Detectors for common Golden Signals-related and billing concerns
- Configuration examples for getting data from Snowflake into Splunk Observability Cloud through the OSS OpenTelemetry Collector
Dashboards
The Snowflake Home Dashboard
The Snowflake Home Dashboard provides you with aggregate overviews of important Snowflake usage metrics like number of warehouses, databases, or users, along with quick summaries of daily cost and total storage usage. This allows you to quickly understand what is going on over your entire Snowflake footprint. In this dashboard the charts shown generally show relevant metrics split by higher-value time periods, such as 24 hours or one hour, to provide you with a high-level understanding that can serve as the starting point for more detailed investigation.
The Warehouse, Database, and Schema dashboards
For more detailed information, you can access other dashboards focused on warehouses, databases, and schemas. These allow you to focus on more specific detail with trends over time to help you identify changes in queries, errors, spilling, queuing, and data ingest.
In the example below, the Snowflake Warehouse dashboard shows queries by warehouses, active warehouses, and errors, including queries by warehouse size. You can use this information to see if you have any small warehouses that are getting too many queries or large warehouses that are getting too few queries, at which point you can make adjustments accordingly.
The Queries dashboard
To get even more detail, you can access the Queries dashboard. This shows you how many queries have been run or executed, and queries that are blocked, provisioned, or overloading. The last table on this page shows an error breakdown by database and user, and it also includes the error message and the query type. This can be useful in a few scenarios, for example:
- When you want to find out where the most errors are happening
- If you see a storm of errors and want to find out who or what might be causing it
The Cost dashboard
Snowflake credits are cyclically counted for the current day, and this dashboard helps you see how much you have used. You can see the daily credits used for the day and what they're being used on. There is also a table that shows the credit breakdown by service type and where those credits are being spent.
The Security dashboard
This dashboard shows total logins in your environment, failed logins, and sessions split by user. This can help you identify any divergent trends that may need immediate attention.
Detectors
The Golden Signals are helpful metrics to track for various software services. The Detectors for Snowflake provided in the Observability Content Contributor repository follow the Golden Signals pattern of looking at Latency, Errors, Traffic, and Saturation (L.E.T.S.) along with a couple of billing related detectors. These detectors are a helpful head start and can have their thresholds easily adjusted to match expectations in your own environment.
Latency
- Long queries in small / extra-small warehouses (usually a sign you may want to upgrade your Warehouse size)
- Queries taking longer than 15 minutes
Errors
- DB errors total
- DB error rate
- Login failure rate by user
Traffic
- Blocked queries by warehouse
- No queries seen in last X hours
Saturation
- Overloaded queries
- Queries queued for over X seconds
Billing
- Credits used by warehouse anomaly detection
- Warehouses with high cloud services cost
Using these detectors as examples will help you identify what issues might be occurring during an incident, and whether they are related to Snowflake or other elements of your stack.
Configuration examples
These examples expect you are using the splunk-otel-collector, but these examples also work with any other OTEL configuration. These are only examples. Your configuration will likely be slightly different.
- Agent_config.yaml. Contains receiver, exporter, pipeline configuration. The receiver entries for Snowflake can be found under
smartagent/sql
- You must add your Snowflake account to this config where account is taken from this format:
<account>.snowflakecomputing.com
- If you plan to use a custom role rather than
ACCOUNTADMIN
you will need to add your role to the config. - Resolution of
3600
seconds (1 hour) is recommended due to the latency between actions happening and showing up in theSNOWFLAKE/ACCOUNT_USAGE
DB view. It is possible to collect at a higher interval, but it is not recommended.
- You must add your Snowflake account to this config where account is taken from this format:
- Splunk-otel-collector.conf. Contains referenced variables like snowflake username / password, and Splunk Observability token, etc.
- Add your Splunk Observability token in
SPLUNK_ACCESS_TOKEN
- Add your Snowflake User to
SNOWFLAKE_USER
(the user must have a role that allows access to theSNOWFLAKE/ACCOUNT_USAGE
db view). - Add the password for your Snowflake user account to
SNOWFLAKE_PASS
- Add your Splunk Observability token in
- Snowflake-metrics.yaml. Contains SQL queries and mappings for our Splunk Observability metrics and dimensions.
- The
snowflake-other-metrics.yaml
file contains SQL queries for detailed and high cardinality DB query metrics, including thequery_id
dimension which is a GUID. - When using these metrics, replace the
DB Metrics
insnowflake-metrics.yaml.
- Billing usage is shown in US dollars.
- The
Next steps
These resources might help you understand and implement this guidance:
- Github: SnowflakeDB
- Docs: JDBC Driver for Snowflake