Skip to main content
 
 
Splunk Lantern

Web access and web error log correlation

 

While web access logs tell you when users experience errors and for which page requests, error logs indicate why the problem occurred. When these log sources are correlated, it may become evident that certain errors occur only for specific pages, browsers, tenants, or some other class of users. You want to correlate these two data sources for a clearer understanding of the impact specific errors have on users. 

Data required 

Web server data

Procedure 

  1. Ensure you are ingesting web server data. This sample search uses the Splunk Add-on for Apache Web Server, but you can replace this source with any other web server data used in your organization. For more information, see About installing Splunk add-ons.
  2. Run the following search. You can optimize it by specifying an index and adjusting the time range.
tag=web OR tag=error (sourcetype=apache:error OR (sourcetype=apache:access status>299))
|eval status_group=case(status<300, "2xx", status<400, "3xx", status<500, "4xx", status<600, "5xx", true(), unknown)
|eval log_event_type = if(sourcetype="apache:error", "apache_error", status_group)
|timechart span=1h count BY log_event_type

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation

tag=web OR tag=error

Search for events that are tagged as web events.

(sourcetype=apache:error OR (sourcetype=apache:access status>299))

Search only Apache errors and page requests with an HTTP status.

If you are using a web server other than Apache, update the fields as necessary.

|eval status_group=case(status<300, "2xx", status<400, "3xx", status<500, "4xx", status<600, "5xx", true(), unknown)

Group status codes together by 200s, 300s, 400s, and 500s. 

|eval log_event_type = if(sourcetype="apache:error", "apache_error", status_group)

Separate the web access logs from the error logs.

If you are using web server other than Apache, update the fields as necessary.

|timechart span=1h count BY log_event_type

Graph the different error types over time in 1-hour increments.

Next steps

If not all the log_event_types have a corresponding entry in the error log, you might need to look at the log level for errors. For instance, Apache does not log errors for pages if the logging level = warn.

A good next step is to look for 4xx errors that have corresponding apache_error log entries. For example, a 403 status found in a 4xx log_event_type should correspond to a permission error in the Apache error log. Similarly, a 404 status code should correspond to a file not found error. Corresponding errors can be seen by replacing |timechart span=1h count BY log_event_type with |table _time log_event_type _raw.

In the resulting table, you can look at related entries that are close in time for cause and effect. This information can be used for troubleshooting. 

Finally, you might be interested in other processes associated with the Managing web server performance use case.