Skip to main content
Splunk Lantern

Windows availability problems

You might need quick information about Windows shutdowns and crashes when doing the following:

Prerequisites 

In order to execute this procedure in your environment, the following data, services, or apps are required:

Example

Windows uptime is extremely important to everyone at your organization. When basic Windows resources aren't functioning, productivity declines dramatically. You need to be able to quickly identify systems with availability issues due to unexpected shutdowns, application crashes, and hangs.

NOTE: To optimize the search shown below, you should specify an index and a time range. 

  1. Verify that you deployed the add-on to the search heads and Splunk Universal Forwarders on the monitored systems. For more information, see About installing Splunk add-ons.
  2. Run the following search: 
source=WinEventLog* "EventCode=1076" OR "EventCode=6008" OR "EventCode=1001" OR "EventCode=1002"
|rex field=Message "(?m)(?<cause>.*)$" 
|rex field=cause mode=sed "s/(at \d{1,2}:\d{1,2}:\d{1,2}.+was)/at (see events for times) was/g"
|stats count(EventCode) AS total_availability_issues values(cause) AS cause BY host, EventCode

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation

source=WinEventLog* 

Search only Windows event logs.

"EventCode=1076" OR "EventCode=6008" OR "EventCode=1001" OR "EventCode=1002" 

Search for unexpected shutdowns and application hangs or crashes.

Type=Error

Search for error events. If no results are found, this might need to be omitted from the search. 

|rex field=Message "(?m)(?<cause>.*)$" 

Copy all text in the message field in the event and rename it “cause”.  

|rex field=cause mode=sed "s/(at \d{1,2}:\d{1,2}:\d{1,2}.+was)/at (see events for times) was/g"

Delete lines so only the first line shows for better readability. 

|stats count(EventCode) AS total_availability_issues values(cause) AS cause BY host, EventCode

Count the number of availability errors and group them by host and event code.

Result

The following table shows sample results. You see the host, the EventCode, the total_availability_issues count, and the Cause values, which are descriptive text pulled out of the long Message field in the original event. 

host EventCode total_availability_issues cause

busdev-001

1001

80

Detection of product '20130613', feature 'SetReceiver' failed during request for component '{4E76FF7E-AEBA-4C87-B788-CD47E5425B9D}'

Detection of product 'League of Legends.exe', feature 'SetReceiver' failed during request for component '{F3B1321E-2472-4211-8735-E1239BE41D9F}' Detection of product 'webex.exe', feature 'SetReceiver' failed during request for component '{17BC5B75-6692-40E6-A347-849F595BC802}'

Event Name: AVSubmit 

Event Name: WindowsWcpOtherFailure3

Fault bucket -734962412

Fault bucket 91467906712

Not Available

coredev-002

1001

56

Detection of product 'spytech-spyagent.exe', feature 'SetReceiver' failed during request for component '{FD33EC178-D1B1-3396-99ED-G0BE1B0AA521}' Fault bucket 124914201808 Fault bucket 125796201882 Fault bucket 125822201825 Fault bucket 128886201823 Not Available

dc-cup-01

1002

3

The DFS Replication service is starting.

As you can see the cause field is rich in information. A similar field in the event is captured with the Event_Name field, but it's not as informative as what the cause field shows. This is a result of the inline rex command in the SPL for the search. 

A good next step would be to append the search with the following to return the hosts with the most issues at the top of the list. : 

|sort -  total_availability_issue 

Enriching the event with asset priority information from a lookup would also be a valuable next step in prioritizing mitigation efforts.  

  • Was this article helpful?