Synthetic checks for URL response times

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Certain websites and URLs, both internal and external, are critical for employees and customers. Using basic synthetic checks to ensure that URLs are returning quickly, or within an expected SLA can help detect problems before they are reported to the help desk.

Data required

Web server data

Procedure

Verify that you installed the Website Monitoring app on your search head or heavy forwarder, depending on the availability of the URLs to be checked.
Using the Create Inputs from the navigation bar of the Website Monitoring app, create one or more inputs. Adjust additional configurations as needed, such as which index to send data, usage of proxy servers, and what results constitute failure. For more information, see the Wiki for this project.
Run the following search. You can optimize it by specifying an index and adjusting the time range.

index="<name of web ping index>" sourcetype="web_ping" title=*
| eval response_code=(case((timed_out == "True"),"Connection timed out",(isnull(response_code) OR (response_code == "")),"Connection failed",true(),response_code) . coalesce((" " . has_expected_string),""))
| eval error_threshold=coalesce(error_threshold,1000), warning_threshold=coalesce(warning_threshold,800), status=case(((((((((response_code >= 400) OR (total_time >= error_threshold)) OR (response_code == "*false")) OR (response_code == "Connection failed")) OR (response_code == "Connection timed out")) OR (timed_out == True)) OR (response_code == "")) OR (has_expected_string == "false")),"Failed",(total_time >= warning_threshold),"Warning",true(),"OK") 
| stats sparkline(avg(total_time)) AS response_time_trend_ms avg(total_time) AS avg_response_time_ms max(total_time) AS max_response_time_ms latest(response_code) AS response_code latest(status) AS status by title, url

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation

Splunk Search	Explanation
`index="<name of web ping index>" sourcetype="web_ping" title=*`	Search the index where the Website Monitoring application is configured to store results from synthetic checks. For the index name, you can use the macro `website_monitoring_search_index` which is shipped with the Website Monitoring app. Optionally, you can replace title=* with a more limited filter on the specific URLs you wish to view and alert on as seen in the demo data SPL
`\| eval response_code=(case((timed_out == "True"),"Connection timed out",(isnull(response_code) OR (response_code == "")),"Connection failed",true(),response_code) . coalesce((" " . has_expected_string),""))` `\| eval error_threshold=coalesce(error_threshold,1000), warning_threshold=coalesce(warning_threshold,800), status=case(((((((((response_code >= 400) OR (total_time >= error_threshold)) OR (response_code == "*false")) OR (response_code == "Connection failed")) OR (response_code == "Connection timed out")) OR (timed_out == True)) OR (response_code == "")) OR (has_expected_string == "false")),"Failed",(total_time >= warning_threshold),"Warning",true(),"OK")`	Create `response_code` and status fields based on various thresholds.
`\| stats sparkline(avg(total_time)) AS response_time_trend_ms avg(total_time) AS avg_response_time_ms max(total_time) AS max_response_time_ms latest(response_code) AS response_code latest(status) AS status BY title, url`	Format the results in a table to see how each check is performing, renaming the fields as shown and using a sparkline to show trends.

index="<name of web ping index>" sourcetype="web_ping" title=*

Search the index where the Website Monitoring application is configured to store results from synthetic checks.

For the index name, you can use the macro website_monitoring_search_index which is shipped with the Website Monitoring app.

Optionally, you can replace title=* with a more limited filter on the specific URLs you wish to view and alert on as seen in the demo data SPL

| eval response_code=(case((timed_out == "True"),"Connection timed out",(isnull(response_code) OR (response_code == "")),"Connection failed",true(),response_code) . coalesce((" " . has_expected_string),""))

| eval error_threshold=coalesce(error_threshold,1000), warning_threshold=coalesce(warning_threshold,800), status=case(((((((((response_code >= 400) OR (total_time >= error_threshold)) OR (response_code == "*false")) OR (response_code == "Connection failed")) OR (response_code == "Connection timed out")) OR (timed_out == True)) OR (response_code == "")) OR (has_expected_string == "false")),"Failed",(total_time >= warning_threshold),"Warning",true(),"OK")

Create response_code and status fields based on various thresholds.

| stats sparkline(avg(total_time)) AS response_time_trend_ms avg(total_time) AS avg_response_time_ms max(total_time) AS max_response_time_ms latest(response_code) AS response_code latest(status) AS status BY title, url Format the results in a table to see how each check is performing, renaming the fields as shown and using a sparkline to show trends.

Next steps

To alert when a synthetic check takes too long, you can use the SPL in this procedure to configure an alert. You can filter the most recent results in several different ways to obtain the list of URLs that require action, but the simplest recommendation is to add | where status!=OK to the end of the SPL to alert on any URL which is either taking too long, timing out, or returning an unexpected status code.

Finally, you might be interested in other processes associated with the Managing web server performance use case.