Skip to main content
 
 
Splunk Lantern

Top ten slowest database queries

 

Slow database queries often result in slow response times for the upstream consumer of the information, resulting in user dissatisfaction or SLA violations. You want to identify the worst performing queries so you can investigate them.

Data required 

Database data

Procedure

  1. Verify that you have deployed Splunk Stream.
  2. Run the following search. You can optimize it by specifying an index and adjusting the time range.
|eval time_taken_sec = time_taken / 1000000
|stats count sum(time_taken_sec) AS total_time_taken_sec avg(time_taken_sec) AS avg_time_taken_sec perc95(time_taken_sec) AS perc95_time_taken_sec max(time_taken_sec) AS max_time_taken_sec BY query
|sort - perc95_time_taken_sec
|head 10
|table query count *_time_taken_sec
|foreach *_time_taken_sec [eval <<FIELD>>=round('<<FIELD>>', 3)]

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation

|eval time_taken_sec = time_taken / 1000000

Convert microseconds to seconds for better readability.

|stats count sum(time_taken_sec) AS total_time_taken_sec avg(time_taken_sec) AS avg_time_taken_sec perc95(time_taken_sec) AS perc95_time_taken_sec max(time_taken_sec) AS max_time_taken_sec BY query

Calculate the total time taken, average time taken, max time taken, and the 95th percentile of times taken (all in seconds) for each database query.

|sort - perc95_time_taken_sec

Sort the results according to the 95th percentile with the largest value first.

|head 10

Display the top 10 results in a table.

|table query count *_time_taken_sec

Display the results in a table with columns in the order shown.

|foreach *_time_taken_sec [eval <<FIELD>>=round('<<FIELD>>', 3)]

Round the results to three decimal places.

Next steps

The following is a sample result of this search. 

query count avg_time_taken_sec max_time_taken_sec perc95_time_taken_sec total_time_taken_sec

SELECT * FROM customers WHERE customer_id=uid

7

1.08

1.66

1.61

7.55

SELECT password FROM SYS.USER$ WHERE name='SYS'

887

0.04

0.06

0.06

39.85

SELECT * FROM hr_employee_info

1700

0.04

0.06

0.06

76.35

The output shows the longest running SQL statements sorted by the percentile column so you can pinpoint which SQL statements take the most time. It also shows the frequency of execution. Total time may not be the highest for the longest running query because of a lower count value. The worst case is when the longest running query also has the highest count value.  

Another helpful observation is to look at total time taken. In the sample results, the highest total time taken from the 1,700 executions of the shown SELECT statement  is just over 76 seconds. This may also be a place for optimization on the client side, such as reducing the calls by caching the results on the client. It may also be that the call is misplaced in the client code and is called more frequently then necessary. 

Performance profiling of SQL statements is also commonly found in the database itself. Having profiling of the statements also in Splunk lets you group it with other related performance metrics such as cpu, memory, storage, middleware, network, and web interface from all the components that make up the service or application. You can put all these results side-by-side on a single dashboard for more insight than you'd have from one of these measurements alone. 

Finally, you might also want to look at similar searches in our article Analyzing wire data from databases.