Top ten slowest database queries
Slow database queries often result in slow response times for the upstream consumer of the information, resulting in user dissatisfaction or SLA violations. You want to identify the worst performing queries so you can investigate them.
Data required
Procedure
- Verify that you have deployed Splunk Stream.
- Run the following search. You can optimize it by specifying an index and adjusting the time range.
|eval time_taken_sec = time_taken / 1000000 |stats count sum(time_taken_sec) AS total_time_taken_sec avg(time_taken_sec) AS avg_time_taken_sec perc95(time_taken_sec) AS perc95_time_taken_sec max(time_taken_sec) AS max_time_taken_sec BY query |sort - perc95_time_taken_sec |head 10 |table query count *_time_taken_sec |foreach *_time_taken_sec [eval <<FIELD>>=round('<<FIELD>>', 3)]
Search explanation
The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.
Splunk Search | Explanation |
---|---|
|eval time_taken_sec = time_taken / 1000000 |
Convert microseconds to seconds for better readability. |
|stats count sum(time_taken_sec) AS total_time_taken_sec avg(time_taken_sec) AS avg_time_taken_sec perc95(time_taken_sec) AS perc95_time_taken_sec max(time_taken_sec) AS max_time_taken_sec BY query |
Calculate the total time taken, average time taken, max time taken, and the 95th percentile of times taken (all in seconds) for each database query. |
|sort - perc95_time_taken_sec |
Sort the results according to the 95th percentile with the largest value first. |
|head 10 |
Display the top 10 results in a table. |
|table query count *_time_taken_sec |
Display the results in a table with columns in the order shown. |
|foreach *_time_taken_sec [eval <<FIELD>>=round('<<FIELD>>', 3)] |
Round the results to three decimal places. |
Next steps
The following is a sample result of this search.
query | count | avg_time_taken_sec | max_time_taken_sec | perc95_time_taken_sec | total_time_taken_sec |
---|---|---|---|---|---|
SELECT * FROM customers WHERE customer_id=uid |
7 |
1.08 |
1.66 |
1.61 |
7.55 |
SELECT password FROM SYS.USER$ WHERE name='SYS' |
887 |
0.04 |
0.06 |
0.06 |
39.85 |
SELECT * FROM hr_employee_info |
1700 |
0.04 |
0.06 |
0.06 |
76.35 |
The output shows the longest running SQL statements sorted by the percentile column so you can pinpoint which SQL statements take the most time. It also shows the frequency of execution. Total time may not be the highest for the longest running query because of a lower count value. The worst case is when the longest running query also has the highest count value.
Another helpful observation is to look at total time taken. In the sample results, the highest total time taken from the 1,700 executions of the shown SELECT statement is just over 76 seconds. This may also be a place for optimization on the client side, such as reducing the calls by caching the results on the client. It may also be that the call is misplaced in the client code and is called more frequently then necessary.
Performance profiling of SQL statements is also commonly found in the database itself. Having profiling of the statements also in Splunk lets you group it with other related performance metrics such as cpu, memory, storage, middleware, network, and web interface from all the components that make up the service or application. You can put all these results side-by-side on a single dashboard for more insight than you'd have from one of these measurements alone.
Finally, you might also want to look at similar searches in our article Analyzing wire data from databases.