Troubleshoot Mission-Critical Apps and Infrastructure
IT environments are made up of thousands of apps, servers, and virtual machines that produce high volumes of constantly changing data, often stored in disconnected data silos, each with their own monitoring tools. Your teams need fast and easy visibility across data storage silos and monitoring tools to accurately detect and resolve an incident, and prevent long mean times to detect and mean times to resolve.
When your data is siloed, complex interactions between infrastructure and app components go unnoticed. Similarly, relying on siloed monitoring views hinders your ITOps teams’ abilities to determine probable root causes of incidents. This leads to slow triage, redundant problems, and a poor downstream customer experience. You need a unified log and metric management solution that can bring unstructured data from any source to break down silos at petabyte scale.
How can Splunk Enterprise, Splunk Cloud Platform, and Splunk ITOps apps help with troubleshooting mission critical apps and infrastructure?
Provide holistic visibility of your machine data, logs and events, no matter the data source
The Splunk platform enables ITOps teams to tackle data sprawl. They can collect and index data from virtually any source and location, and do it at scale while managing cost. Data is stored in the Splunk index without sampling, which allows customers to analyze current and historical incidents. The Splunk platform helps ITOps teams optimize cloud infrastructure usage and spend with easy monitoring to pinpoint unusual spikes or trends.
To manage growing data volumes and cost, customers can filter, enrich, transform, and route just the data they want from the edge or cloud into the Splunk platform - or send to third party data lakes (such as Amazon S3) to optimize their access at the right performance for each use case they tackle. Combined ITOps and platform engineering teams can convert logs to metrics and freely analyze and correlate data without the limitations of conventional database structures.
Finally, federated data access ensures that you’re not creating new data silos, and that you have easy access to all relevant data for unplanned investigations and longer term needs like audit or compliance. For a complete business view, teams can import reference data or trained artificial intelligence and machine learning algorithms from relational databases, data warehouses, or the data lake.
Help accelerate mean time to detection, investigation, and response
By centralizing data across tools and surfacing key risks, the Splunk platform empowers your ITOps teams to streamline and standardize workflows to reduce mean time-to-detection (MTTD) and mean time-to-response (MTTR). With the Splunk platform, ITOps teams have reduced MTTD by over 80% and reduced high priority incidents by over 50%, improving IT efficiency for competitive advantage and boosting customer experiences. Splunk Enterprise and Splunk Cloud Platform enable fast and extensive issue investigation for ITOps teams through the identification of emerging issues, deep root cause analysis, and rapid incident resolution. With schema-on-the-fly and a powerful search language, the Splunk platform allows you to quickly pinpoint incident start times, correlate across disparate data silos, and obtain the true root cause of incidents to ensure they cannot happen again. Most competitive monitoring tools only focus on basic metrics and availability monitoring. The Splunk platform goes deeper and helps ITOps teams to get proactive notification of system and application health with rich insights only found within log and event data. ITOps and EngineeringDevOps teams can also build on logging capabilities in the Splunk platform by reusing logs for cloud-native application and infrastructure debugging in combination with traces and metrics through Splunk Log Observer Connect.
Support operational resilience mandates and initiatives while keeping the customer’s production environments secure
Splunk Enterprise and Splunk Cloud Platform give your ITOps teams the data needed to safely and securely roll out and roll back changes, at cloud-scale. In addition, with approximately 1,000 purpose-built data source integrations and over 2,800 Splunkbase apps, you can extend the value of the Splunk platform as you evolve your business. The Splunk platform secures and reduces risk to the production environment by providing investigations and data analysis in the Splunk platform rather than directly on production systems. ITOps teams using the Splunk platform can easily revoke credentials from analysts who no longer need production system access, resulting in a more secure environment that is less prone to human error.
Optimize resources with informed, data-driven decision-making, while reducing manual and time-consuming tasks
The Splunk platform helps your teams and executive stakeholders from the line of business to analyze machine data, so they can begin to understand how systems and services are performing. The Splunk platform can help ITOps teams accomplish this understanding without a reliance on business intelligence (BI) or reporting teams, which are often hampered by slow and brittle extract, transform, and load (ETL) processing. Splunk Enterprise and Splunk Cloud Platform custom compliance and reporting dashboards can efficiently scale to suit any enterprise ITOps team’s demands. Finally, the Splunk platform helps ITOps teams gain efficiencies by automating routine and time-consuming tasks, and through custom dashboards and reports, these teams can reduce manual tasks while proactively analyzing custom scripts developed by their teams.
Use case guidance
- Analyzing wire data from databases
- Learn how to analyze database queries using Splunk software, helping you identify performance issues, optimization opportunities and more.
- Investigating user login issues and account lockouts
- How to use Splunk software to set up searches to help you identify the root cause of these issues more quickly.
- Maintaining *nix systems with the Splunk platform
- How to monitor *nix systems running critical applications or services, with Splunk searches that you can save and run on a schedule.
- *Nix CPU utilization nearing capacity
- *Nix hosts with NFS connectivity issues
- *Nix host stops reporting data
- *Nix memory utilization nearing capacity
- All logs and events on a *nix host
- Expected *Nix process not running
- Filesystem mounts after *nix patching event
- Package installations and upgrades on a *nix server
- Processes running after *nix patching event
- Maintaining Microsoft Windows systems with the Splunk platform
- Use Windows data with your Splunk deployment to monitor patch management, software deployment, inventory tracking, remote access availability, and more.
- All Windows events on a host
- Current state of Windows services on a host
- Expected Windows process not running
- Failed Windows updates
- Microsoft recommended application log events
- Windows availability problems
- Windows CPU utilization nearing capacity
- Windows disk drive utilization nearing capacity
- Windows host stops reporting data
- Windows memory utilization nearing capacity
- Managing *nix system user account behavior
- Learn how to use *nix data with your Splunk deployment to track actions and events that are important for user account behavior management.
- Managing an Amazon Web Services environment
- How to use Splunk software to manage AWS, including EC2 instances, ELB instances, virtual private clouds, elastic block store volumes and more.
- Changes made to AWS cloud infrastructure
- Common AWS resource tags and tag values
- CPU utilization of Elastic Compute Cloud (EC2) instances
- Critical AWS Lambda metrics
- Current AWS elastic block store volumes
- Current AWS Elastic Compute Cloud (EC2) instances
- Current AWS elastic load balancer instances
- Current AWS virtual private cloud infrastructure
- Disabled AWS CloudTrail logging
- Geographic access to AWS S3 Buckets
- Health of AWS elastic load balancers
- Health of critical AWS infrastructure from CloudWatch metrics
- Logging output from AWS Cloudwatch
- Logging output from AWS Lambda functions
- Missing AWS resource tags
- Public S3 bucket identification
- Resources with non-compliant AWS configuration rules
- Unattached AWS elastic block store volumes
- Unused Elastic IPs with no attached instances
- Users who haven't accessed AWS for an extended time
- Managing Azure cloud infrastructure
- How to use your Splunk deployment to manage all components of your Azure cloud infrastructure and provide you with necessary information and alerts.
- Azure Active Directory audit events
- Azure Active Directory users with no access for extended periods
- Azure critical infrastructure health
- Azure load balancers with no healthy instances
- Azure public storage blobs with anonymous access traffic
- Azure resources with non-compliant policy rules
- Azure resources with no associated tags
- Azure security policy review
- Azure storage blobs made public and by who
- Incorrectly provisioned virtual machines
- Inventory of Azure managed disks
- Inventory of Azure virtual machines
- Inventory of Azure virtual networks
- Inventory of unattached Azure managed disks
- List of Azure resource changes
- List of Azure resource network interface cards
- List of Azure resource public IP addresses
- List of Azure resource unused public IP addresses
- Logging output from any Azure Event Hub logs
- Visualisation of common Azure resource tags and tag values
- Managing Cisco IOS devices
- How to use Splunk to identify and resolve Cisco IOS device problems like duplicate IP addresses, duplex mismatches, overheating, port flapping and more.
- Managing Dell Isilon network attached storage
- Monitor Dell Isilon NAS metrics, including CPU utilization, cluster throughput, and anomalies like user access failures and other events shown in audit logs.
- Managing O365 workloads
- Start tracking operations performed on SharePoint to see how much usage it really gets, with this search you can run in Splunk software.
- Managing printers in a Windows environment
- Learn how to use Splunk software to monitor usage and functionality of printers on your Windows network.
- Measuring memory utilization by host
- How to track memory utilization of operating system processes so that you can troubleshoot or scale systems to avoid latency.
- Measuring storage I/O latency
- Identify underlying hardware issues when systems experience storage latency with this search you can run in Splunk.
- Measuring storage speed I/O utilization by host
- As a system administrator, you want to monitor disk operations and, when swapping is executed, to know when systems may slow down.
- Monitoring log volume trends
- Pinpoint server changes or issues by running this process in Splunk software to monitor log volume trends.
- Monitoring VMware virtualization infrastructure
- How to set up searches with Splunk software to monitor VMware virtual machine performance, with search snippets from Splunk experts.
- ESXi hosts with high CPU Ready summation value
- ESXi hosts with sustained high ballooning
- ESXi hosts with sustained high swapping
- ESXi host version identification
- Recently triggered vSphere alarms
- Topology of a VMware environment
- vCenter console logins
- Virtual machines with large file size utilization
- VMotion events for a specific virtual machine
- VMware datastores with highest utilization
- vSphere configuration changes
- Monitoring web application performance
- How to use Splunk software to gain visibility of your application health and monitor the availability, performance, and usage of your applications.
- Preparing for certificate-based authentication changes on Windows domain controllers
- This article provides a shortcut to implement the recommended mapping to X509IssuerSerialNumber using Splunk platform, Excel, and PowerShell.
- Recovering lost visibility of IT infrastructure
- How to use Splunk after a malware attack to recover lost visibility into the health and operations of your infrastructure.
- Using stack traces to detect application errors
- Run these searches in Splunk software to investigate application errors in stack traces, helping you identify issues or trends that you should investigate.
- Using the Splunk platform to monitor key horse-related data points
- The Splunk platform can be used to monitor key data points related to the care and enjoyment of horses.