Windows host stops reporting data
Host availability is a critical aspect of IT operations monitoring. You want to monitor and alert on hosts that have become unavailable either because it has gone down, or has otherwise lost its ability to send data to your Splunk deployment.
- Microsoft: Windows event and update logs
In Splunk Enterprise or Splunk Cloud Platform, this procedure can operate on any event data which is consistently received from the host including data from the Splunk Add-on for Microsoft Windows add-on.
- In Splunk Enterprise or Splunk Cloud Platform, verify that you deployed the Splunk Add-on for Microsoft Windows add-on to your search heads, indexer, and Splunk Universal Forwarders on the monitored systems. For more information, see About installing Splunk add-ons.
- Run the following search. You can optimize it by specifying an index and adjusting the time range.
|tstats dc(host) AS val max(_time) AS _time WHERE index="<index to check>" host="<hosts to check>" BY host |append [|metadata type=hosts index="<index to check>" | table host lastTime | rename lastTime AS _time | where _time>now()-(60*60*12) | eval val=0] |stats max(val) AS val max(_time) AS _time by host | where val=0 | rename val AS "Has Data" | eval Missing Duration= tostring(now()-_time, "duration") | table host "Has Data" "Missing Duration"
The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.
||tstats dc(host) AS val max(_time) AS _time WHERE index="<index to check>" host="<hosts to check>" BY host||Obtain a lists of all hosts for which data has been recently received.|
||append [|metadata type=hosts index="<index to check>" | table host lastTime | rename lastTime AS _time | where _time>now()-(60*60*12) | eval val=0]||Obtain a list of all hosts that have sent data into the environment in the last 12 hours and add the results onto the previous results table.|
||stats max(val) AS val max(_time) AS _time by host||Create a table with a val column where val=1 if the data was seen for the host, and val=0 if not. Include a _time column that contains the timestamp of the most recently seen event for that host, and group by host.|
|| where val=0||Filter the results to only hosts not currently sending data.|
|| rename val AS "Has Data"||Rename the field as shown for better readability.|
|| eval Missing Duration= tostring(now()-_time, "duration")||Convert the Missing Duration value into a string formatted as HH:MM:SS.|
|| table host "Has Data" "Missing Duration"||Display the results in a table with columns in the order shown.|
The metric cpu.utilization is fundamental and should be present on all hosts. Create an alert based on this search so you can proactively manage potential stability issues. To alert when a host is no longer sending data, you can configure one of the following two recommendations:
- Use the SPL from this procedure to configure a Core Splunk alert.
- Build a new Vital Metric in IT Essentials Work for the desired entity type and configure vital metric alerting.
Finally, you might be interested in other processes associated with the Maintaining Microsoft Windows systems use case.