Top ten highest network utilization queries
Queries that result in large transfers of data into or out of the database may be a sign of an inefficient design and can place a strain on the networking infrastructure between the client and the database. You want to create a list of the top ten so that you can work with the owners to refactor such queries.
Data required
Procedure
- Verify that you have deployed Splunk Stream.
- Run the following search. You can optimize it by specifying an index and adjusting the time range.
|stats count sum(bytes) AS total_bytes BY query |eval avg_bytes_per_execution = round(total_bytes/count, 2) |sort - total_bytes |head 10
Search explanation
The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.
Splunk Search | Explanation |
---|---|
|stats count sum(bytes) AS total_bytes BY query |
Count each query, as well as the total bytes used by that query. |
|eval avg_bytes_per_execution = round(total_bytes/count, 2) |
Return the average bytes per query execution (rounded to two decimal points) by dividing the total bytes by the number of queries. |
|sort - total_bytes |
Sort the results according to total bytes with the largest value first. |
|head 10 |
Display the top 10 results in a table. |
Next steps
The output from the query includes the text for each of the SQL statements, the count of the number of times the statement appears, the total bytes in the result set, and the average number of bytes for each statement. The following table is a sample result.
query | count | total_bytes | avg_bytes_per_execution |
---|---|---|---|
SELECT * FROM hr_employee_info |
1700 |
16994436 |
9996.73 |
SELECT password FROM SYS.USER$ WHERE name='SYS' |
887 |
678370 |
764.79 |
SELECT * FROM customers WHERE customer_uid=3bf48b89-77b0-4cc2-ab5b-afbe08dc1e96 |
22 |
52722 |
2396.45 |
SELECT * FROM customers WHERE customer_uid=b56a5c74-87a3-4593-9eb5-2d2b1c38a0d8 |
22 |
45145 |
2052.05 |
This sample output is telling in several ways. First, large result sets, such as the SELECT * FROM result in the first row, should not be expected. If they occur often, they might be indicative of inefficient SQL. Owners should be notified when their queries are in the top ten largest data transfer queries. When possible, and if appropriate, owners should refactor the queries to transfer less data. They may also indicate a security concern if the data is of a sensitive nature, such as human resources data and passwords. The example above shows both.
Potential next steps are to correlate this database activity with applications and users to put it into a business context. If user or application id is in the stream, you could add it to the split by clause of the stats command like this:
| stats count sum(bytes) AS total_bytes BY query user
Lastly, you could vary the sort by fields to look at count or avg_bytes_per_execution and use the results to determine the most beneficial troubleshooting steps to take.
Finally, you might also want to look at similar searches in our article Analyzing wire data from databases.