Scenario: In your organization, you have many applications and services hosted on Microsoft Windows that are critical to the support of the business. Because of the reliance on these critical applications and services by workers and management, you need to monitor availability and performance to make sure that the functionality is there when needed. In order to do this, you need to search application and infrastructure logs for key indicators of failures and potential performance degradation, which are often disparate. Because it is easy to get data into Splunk and then search and alert on key indicators, you are motivated to onboard data. After the data is available, you want to develop and save searches that help you achieve this type of monitoring efficiently.
How Splunk software can help
You can use Splunk software to monitor a large number of Windows system management tasks and events, such as patch management, software deployment, inventory tracking, remote access availability, and more.
What you need
To succeed in implementing this use case, you need the following dependencies, resources, and information.
The best person to implement this use case is a site reliability engineer or system administrator who is familiar with Windows servers and any other Windows applications used in the organization. This person might come from your team, a Splunk partner, or Splunk OnDemand Services.
Maintaining Microsoft Windows systems using Splunk software can last up to a few hours to longer depending on the scale of your organization.
The following technologies, data, and integrations are useful in successfully implementing this use case:
- Splunk Enterprise or Splunk Cloud
- Data sources onboarded
- Windows event logs
- Splunk Add-on for Microsoft Windows
- Splunk App for Windows Infrastructure
How to use Splunk software for this use case
You can run many searches with Splunk software to maintain Microsoft Windows systems. Depending on what information you have available, you might find it useful to identify some or all of the following:
Other steps you can take
To maximize their benefit, the how-to articles linked in the previous section likely need to tie into existing processes at your organization or become new standard processes. These processes commonly impact success with this use case:
- Active directory administration, which is closely related to Windows Maintenance
- The use of cloud services, such as Azure, to cover Windows maintenance requirements
- Integration with ticketing systems used for the service desk
- The use of any other related applications, such MS SQL Server, IIS, Exchange, and O365, which can all affect a Windows environment
These additional Splunk resources might help you understand and implement this use case:
- Blog: Peeping through Windows (logs)
- Blog: Splunking Microsoft Azure Monitor Data - Part 1 - Azure Setup
- Blog: Splunking Microsoft Azure Monitor Data Part - 2 - Splunk Setup
- Docs: Monitoring Windows event log data
- Blog: What the WEF... Choosing Windows Event Forwarding or Splunk UF
- Conf talk: Tracking Logs at Zillow with Lookups & JIRA
How to assess your results
Measuring impact and benefit is critical to assessing the value of IT operations. The following are example metrics that can be useful to monitor when implementing this use case:
- Availability of service: Percentage of agreed service time to down time
- Maintainability of service: Mean time to repair (MTTR) and mean time between failure (MTBF)
- Additional metrics: Page load times, average response time, and operations per second